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Abstract 

We  study  infinite  stochastic  games  played  by  two-players  on  a  finite 
graph  with  goals  specified  by  sets  of  infinite  traces.  The  games  are 
concurrent  (each  player  simultaneously  and  independently  chooses  an 
action  at  each  round),  stochastic  (the  next  state  is  determined  by  a 
probability  distribution  depending  on  the  current  state  and  the  chosen 
actions) ,  infinite  (the  game  continues  for  an  infinite  number  of  rounds) , 
nonzero- sum  (the  players’  goals  are  not  necessarily  conflicting),  and 
undiscounted.  We  show  that  if  each  player  has  an  w-regular  objective 
expressed  as  a  parity  objective,  then  there  exists  an  e-Nash  equilibrium, 
for  every  e  >  0.  However,  exact  Nash  equilibria  need  not  exist.  We 
study  the  complexity  of  finding  values  (payoff  profile)  of  some  e-Nash 
equilibrium.  We  show  that  the  values  of  some  e-Nash  equilibrium  in 
nonzero-sum  concurrent  parity  games  can  be  computed  by  solving  the 
following  two  simpler  problems:  computing  the  values  of  zero-sum  (the 
goals  of  the  players  are  strictly  conflicting)  concurrent  parity  games  and 
computing  e-Nash  equilibrium  values  of  nonzero-sum  concurrent  games 
with  reachability  objectives.  As  a  consequence  we  establish  that  values 
of  some  e-Nash  equilibrium  can  be  approximated  in  FNP  (functional 
NP),  and  hence  in  EXPTIME. 
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9988172,  and  CCR-0225610,  and  by  the  NSF  Career  grant  CCR-0132780,  the  NSF  grant 
CCR-0234690,  and  by  the  ONR  grant  N00014-02-1-0671 


1 


1  Introduction 


Stochastic  games.  Non-cooperative  games  provide  a  natural  framework 
to  model  interactions  between  agents  [17,  19].  The  simplest  class  of  non- 
cooperative  games  consists  of  the  “one-step”  games  —  games  with  single 
interaction  between  the  agents  after  which  the  game  ends  and  the  payoffs 
are  decided  (e.g.,  matrix  games).  However,  a  wide  class  of  games  progress 
over  time  and  in  stateful  manner,  and  the  current  game  depends  on  the 
history  of  interactions.  Infinite  stochastic  games  [21,  8]  are  a  natural  model 
for  such  games.  A  stochastic  game  is  played  over  a  finite  state  space  and  is 
played  in  rounds.  In  concurrent  games,  in  each  round,  each  player  chooses 
an  action  from  a  finite  set  of  available  actions,  simultaneously  and  indepen¬ 
dently  of  other  players.  The  game  proceeds  to  a  new  state  according  to  a 
probabilistic  transition  relation  (stochastic  transition  matrix)  based  on  the 
current  state  and  the  joint  actions  of  the  players.  Concurrent  games  sub¬ 
sumes  the  simpler  class  of  turn-based  games,  where  at  every  state  at  most 
one  player  can  choose  between  multiple  actions.  In  verification  and  control 
of  finite  state  reactive  systems  such  games  proceed  for  infinite  rounds,  gen¬ 
erating  a  infinite  sequence  of  states,  called  the  outcome  of  the  game.  The 
players  receive  a  payoff  based  on  a  payoff  function  that  maps  every  outcome 
to  a  real  number. 

Objectives.  Payoffs  are  generally  Borel  measurable  functions  [15].  For 
example,  the  payoff  set  for  each  player  is  a  Borel  set  in  the  Cantor 
topology  on  5“’  (where  S  is  the  set  of  states),  and  player  i  gets  payoff  1  if 
the  outcome  of  the  game  is  a  member  of  Bi,  and  0  otherwise.  In  verification, 
payoff  functions  are  usually  index  sets  of  u-regular  languages.  The  w-regular 
languages  generalizes  the  classical  regular  languages  to  infinite  strings,  they 
occur  in  low  levels  of  the  Borel  hierarchy  (they  are  in  S3  fl  Ha),  and  form  a 
robust  and  expressive  language  for  determining  payoffs  for  commonly  used 
specifications  [14].  The  simplest  w-regular  objectives  correspond  to  safety 
(“closed  sets”)  and  reachability  (“open  sets”)  objectives. 

Zero-sum  games.  Games  may  be  zero-sum,  where  two  players  have  di¬ 
rectly  conflicting  objectives  and  the  payoff  of  one  player  is  one  minus  the 
payoff  of  the  other,  or  nonzero-sum,  where  each  player  has  a  prescribed 
payoff  function  based  on  the  outcome  of  the  game.  The  fundamental  ques¬ 
tion  for  games  is  the  existence  of  equilibrium  values.  For  zero-sum  games, 
this  involves  showing  a  determinacy  theorem  that  states  that  the  expected 
optimum  value  obtained  by  player  1  is  exactly  one  minus  the  expected  opti¬ 
mum  value  obtained  by  player  2.  For  one-step  zero-sum  games,  this  is  von 
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Neumann’s  minmax  theorem  [31].  For  infinite  games,  the  existence  of  such 
equilibria  is  not  obvious,  in  fact,  by  using  the  axiom  of  choice,  one  can  con¬ 
struct  games  for  which  determinacy  does  not  hold.  However,  a  remarkable 
result  by  Martin  [15]  shows  that  all  stochastic  zero-sum  games  with  Borel 
payoffs  are  determined. 

Nonzero-sum  games.  For  nonzero-sum  games,  the  fundamental  equilib¬ 
rium  concept  is  a  Nash  equilibrium  [11],  that  is,  a  strategy  profile  such  that 
no  player  can  gain  by  deviating  from  the  profile,  assuming  the  other  player 
continue  playing  the  strategy  in  the  profile.  Again,  for  one-step  games,  the 
existence  of  such  equilibria  is  guaranteed  by  Nash’s  theorem  [11].  However, 
the  existence  of  Nash  equilibria  in  infinite  games  is  not  immediate:  Nash’s 
theorem  holds  for  finite  bimatrix  games,  but  in  case  of  stochastic  games,  the 
strategy  space  is  not  compact.  The  existence  of  Nash  equilibria  is  known 
only  in  very  special  cases  of  stochastic  games.  In  fact,  Nash  equilibria  may 
not  exist,  and  the  best  one  can  hope  for  is  an  e-Nash  equilibrium  for  all 
e  >  0,  where  an  e-Nash  equilibrium  is  a  strategy  profile  where  unilateral 
deviation  can  only  increase  the  payoff  of  a  player  by  at  most  e.  Exact  Nash 
equilibria  do  exist  in  discounted  stochastic  games  [9],  and  other  special  cases 
[26,  27].  For  concurrent  nonzero-sum  games  with  payoffs  defined  by  Borel 
sets,  surprisingly  little  is  known.  Secchi  and  Sudderth  [20]  showed  that  exact 
Nash  equilibria  do  exist  when  all  players  have  payoffs  defined  by  closed  sets 
(“safety  objectives”),  where  the  objective  of  each  player  is  to  stay  within 
a  certain  set  of  good  states.  Formally,  each  player  i  has  a  subset  of  states 
Fi  as  their  safe  states,  and  gets  a  payoff  1  if  the  play  never  leaves  the  set 
Fi  and  gets  payoff  0  otherwise.  This  result  was  generalized  to  general  state 
and  action  spaces  [20,  13],  where  only  e-equilibria  exist.  In  the  case  of  open 
sets  (“reachability  objectives”),  each  player  i  has  a  subset  of  states  Ri  as 
reachability  targets.  Player  i  gets  payoff  1  if  the  outcome  visits  some  state 
from  Ri  at  some  point,  and  0  otherwise.  The  existence  of  e-Nash  equilib¬ 
rium  in  games  with  payoffs  described  as  open  sets,  for  every  e  >  0,  has 
been  established  in  [5].  The  above  results  hold  even  in  the  case  of  ra-player 
games.  In  one  of  the  most  important  recent  result  in  stochastic  game  theory, 
Vieille  shows  the  existence  of  e-Nash  equilibrium,  for  every  e  >  0,  in  two- 
player  concurrent  games  with  limit-average  payoff  [29,  30].  The  existence 
of  e-Nash  equilibrium  in  two-player  concurrent  games  with  objectives  in  the 
higher  levels  of  Borel  hierarchy  has  been  an  intriguing  open  problem. 

Our  result  and  proof  techniques.  In  this  paper  we  show  that  e-Nash 
equilibrium  exists,  for  every  e  >  0,  for  two-player  concurrent  games  with 
w-regular  objectives.  However,  exact  Nash  equilibria  need  not  exist.  For 
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two-player  concurrent  games  our  result  extends  the  existence  of  e-Nash  equi¬ 
librium  from  the  lowest  level  of  Borel  hierarchy  (open  and  closed  sets)  to 
the  classical  w-regular  objectives  that  lie  in  the  higher  levels  of  Borel  hi¬ 
erarchy;  and  our  result  for  w-regular  objectives  parallels  Vieille’s  result  for 
limit-average  objectives.  Our  proof  technique  involves  the  following  key 
ideas: 

1.  We  first  show  the  existence  of  e-Nash  equilibrium,  for  every  e  >  0, 
with  w-regular  objectives,  for  a  sub-class  of  concurrent  games,  namely 
single  strongly  connected  component  (Sscc)  games  in  Section  3. 

2.  We  extend  the  above  result  to  all  concurrent  games  in  Section  4. 

The  result  for  Sscc  games  involves  the  following  key  ideas: 

•  We  identify  four  sufficient  conditions  that  ensure  existence  of  e-Nash 
equilibrium,  for  every  e  >  0,  in  Sscc  games. 

•  We  then  show  that  if  the  sufficient  conditions  are  not  satisfied  then 
the  game  can  be  reduced  to  a  nonzero-sum  game  with  reachability  ob¬ 
jectives,  with  some  desired  properties.  The  result  is  proved  by  gener¬ 
alizing  a  result  from  [2]  and  using  a  fragment  of  analysis  of  Vieille  [29]. 

•  The  existence  of  e-Nash  equilibrium,  for  all  e  >  0,  in  the  original  game 
is  then  established  by  the  use  of  punishing  or  spoiling  strategies. 

Complexity  of  e-Nash  equilibrium.  Computing  the  values  of  a  Nash 
equilibria,  when  it  exists,  is  another  challenging  problem  [18,  32].  For  one- 
step  zero-sum  games,  equilibrium  values  and  strategies  can  be  computed  in 
polynomial  time  (by  reduction  to  linear  programming)  [17].  For  one-step 
nonzero-sum  games,  no  polynomial  time  algorithm  to  compute  an  exact 
Nash  equilibrium  in  a  two-player  game  is  known  [18].  In  case  of  concurrent 
games  with  limit-average  payoff  no  algorithmic  analysis  is  known  even  for 
zero-sum  games.  However,  several  algorithms  are  known  for  several  special 
cases,  e.g.,  for  turn-based  games  [33,  1,  10].  In  case  of  zero-sum  concurrent 
games  with  w-regular  objectives  several  algorithms  are  known  to  compute 
values  with  in  e- approximation  [7,  2].  Since  the  values  can  be  irrational  e- 
approximation  is  the  best  one  can  achieve.  From  the  computational  aspects, 
a  desirable  property  of  an  existence  proof  of  Nash  equilibrium  is  its  ease 
of  algorithmic  analysis.  We  show  that  our  proof  for  existence  of  e-Nash 
equilibrium  is  completely  constructive  and  algorithmic.  Our  proof  shows 
that  the  computation  of  values  of  some  e-Nash  equilibrium  in  two-player 
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concurrent  games  with  parity  objectives  can  be  reduced  to  the  following 
two  simpler  problems: 

1.  Computing  values  of  zero-sum  concurrent  games  with  parity  objec¬ 
tives. 

2.  Computing  values  of  some  special  e-Nash  equilibrium  of  nonzero-sum 
concurrent  games  with  reachability  objectives. 

Since  solving  the  more  general  case  of  nonzero-sum  games  must  involve 
solving  the  special  case  of  zero-sum  games,  our  result  reduces  the  problem  of 
computing  e-Nash  equilibrium  for  w-regular  objectives  to  solving  some  spe¬ 
cial  e-Nash  equilibrium  of  games  with  reachability  objectives.  We  then  prove 
that  the  equilibrium  values  of  some  e-Nash  equilibrium  can  be  approximated 
in  FNP  (functional  NP)  and  hence  in  EXPTIME.  Our  result  matches  the 
best  known  complexity  bound  for  the  simpler  case  of  turn-based  games  [5]. 

Organization.  The  paper  is  organized  as  follows.  In  section  2  we  define 
the  basic  notions  of  games,  strategies  and  objectives.  In  section  3  we  prove 
existence  of  e-Nash  equilibrium  for  a  sub-class  of  concurrent  games,  and 
then  extend  the  result  for  all  concurrent  games  in  section  4.  We  present  the 
complexity  result  for  computing  e-Nash  equilibrium  values  in  section  5.  We 
conclude  with  a  few  open  problems  in  section  6. 

2  Preliminaries 

Notation.  Eor  a  countable  set  A,  a  probability  distribution  on  ^4  is  a  func¬ 
tion  ^ !-)•  [0, 1]  such  that  Xlaeei  ~  denote  the  set  of  probability 

distributions  on  A  by  ^{A).  Given  a  distribution  5  G  'D{A),  we  denote  by 
Supp(^)  =  {rr  G  I  ^(rr)  >  0}  the  support  of  5. 

Definition  1  (Concurrent  Games)  A  (two-player)  concurrent  game 
structure  Q  =  {S,  Moves,  ri,r2,  5)  consists  of  the  following  components: 

•  A  finite  state  space  S. 

•  A  finite  set  Moves  of  moves. 

•  Two  move  assignments  ri,r2  :  S  i-)-  \  0.  For  i  G  {1,2}, 

assignment  Pj  associates  with  each  state  s  ^  S  the  non-empty  set 
rj(s)  C  Moves  of  moves  available  to  player  i  at  state  s. 
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•  A  probabilistic  transition  function  5  :  S  x  Moves  x  Moves  — )■  ^>{3), 
that  gives  the  probability  5{s,ai,  02) (t)  of  a  transition  from  s  to  t  when 
player  1  plays  move  ai  and  player  2  plays  move  02,  for  all  s,t  ^  S  and 
ai  G  ri(s),  02  G  r2(s).  I 

We  distinguish  the  following  special  classes  of  concurrent  game  structures. 

•  A  concurrent  game  structure  Q  is  deterministic  if  for  all  s  G  S'  and  all 

G  ri(s),  (22  G  ^2(5),  there  is  a  t  G  S  such  that  ^(s,  ui,  a2){t)  =  1. 

•  A  concurrent  game  structure  Q  is  turn-based  if  at  every  state  at  most 
one  player  can  choose  among  multiple  moves;  that  is,  if  for  every  state 
s  G  S  there  exists  at  most  one  i  G  {1,  2}  with  |  r*(.)|  >  1. 

•  A  concurrent  game  structure  is  a  Markov  decision  process  (MDP)  if 
there  exists  at  most  one  i  G  {1,  2}  such  that  at  every  state  s,  |rj(s)|  > 
1.  In  other  words,  a  MDP  is  a  one-player  stochastic  game  and  only 
one  player  has  a  non-trivial  choice  of  moves  and  for  the  other  player 
the  choice  of  the  moves  are  fixed. 

We  define  the  size  of  the  game  structure  Q  to  be  equal 
to  the  size  of  the  transition  function  5;  specifically,  \Q\  = 

EsesEaeri(s)E6er2(s)Etesl^(«’«’^)(*)l’  ’^^^ere  |^(s,  a,  6)(t)|  denotes  the 
space  to  specify  the  probability  distribution.  We  write  n  to  denote  the  size 
of  the  state  space,  i.e.,  ra  =  151.  At  every  state  s  G  5,  player  1  chooses  a 
move  oi  G  ri(s),  and  simultaneously  and  independently  player  2  chooses 
a  move  02  G  r2('S).  The  game  then  proceeds  to  the  successor  state  t  with 
probability  ^(s,  ai,  (22)(t),  for  all  t  G  5.  A  state  s  is  called  an  absorbing 
state  if  for  all  oi  G  ri(s)  and  02  G  r2('S)  we  have  ^(s,  ai,  (22)(s)  =  1.  In 
other  words,  at  s  for  all  choice  of  moves  of  the  players  the  next  state  is 
always  s.  A  state  s  is  a  turn-based  state  if  there  exists  i  G  {  1,2  }  such 
that  |ri(s)|  =  1.  Moreover,  if  |r2(s)|  =  1  then  the  state  s  is  a  player- 
1  turn-based  state  since  the  choice  of  moves  for  player  2  is  trivial;  and  if 
[Pi  (5) I  =  1  then  it  is  a  player-2  turn-based  state.  We  assume  that  the  players 
act  non- cooperatively,  i.e.,  each  player  chooses  her  strategy  independently 
and  secretly  from  the  other  player,  and  is  only  interested  in  maximizing  her 
own  payoff.  For  all  states  s  G  5  and  moves  oi  G  ri(s)  and  02  G  r2('S),  we 
indicate  by  Dest(s,  oi,  02)  =  Supp(^(s,  oi,  02))  the  set  of  possible  successors 
of  s  when  moves  oi,  02  are  selected. 

A  path  or  a  play  a;  of  ^  is  an  infinite  sequence  cv  =  (sq,  si,  52,  •  •  •)  of  states 
in  S  such  that  for  all  k  >  0,  there  are  moves  G  ri(sA:)  and  G  r2('SA;) 
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with  5{sk,  af,  a2){sk+i)  >  0.  We  denote  by  O,  the  set  of  all  paths  and  by 
the  set  of  all  paths  u)  =  (sq,  5i,  S2,  ■  ■  ■)  such  that  sq  =  s,  i.e.,  the  set  of  plays 
starting  from  state  s. 

2.1  Randomized  strategies 

A  selector  ^  for  player  i  G  {1,  2}  is  a  function  ^  :  S  ^  V{Moves)  such  that  for 
all  s  G  5  and  a  G  Moves,  if  ^(s)(a)  >  0  then  a  G  rj(s).  We  denote  by  Aj  the 
set  of  all  selectors  for  player  *  G  {  1,  2  }.  A  selector  ^  is  pure  if  for  every  s  G  S' 
there  is  a  G  Moves  such  that  ^{s){a)  =  1;  we  denote  by  Af’  C  Aj  the  set  of 
pure  selectors  for  player  i.  A  strategy  for  player  1  is  a  function  cr  :  S+  — )■  Ai 
that  associates  with  every  finite  non-empty  sequence  of  states,  representing 
the  history  of  the  play  so  far,  a  selector.  Similarly  we  define  strategies  tt  for 
player  2.  A  strategy  a  for  player  i  is  pure  if  it  yields  only  pure  selectors, 
that  is,  is  of  type  S~^  — )■  Af’ .  A  strategy  with  memory  can  be  described  as  a 
pair  of  functions:  (a)  memory  update  function  cr„  :  S'  x  M  x  Moves  — )■  M,  and 
(b)  next  move  function  am  :  S  x  M  — )■  Ai.  A  strategy  with  memory  is  finite 
memory  if  M  is  finite.  A  memoryless  strategy  is  independent  of  the  history 
of  the  play  and  depends  only  on  the  current  state.  Memoryless  strategies 
coincide  with  selectors,  and  we  often  write  a  for  the  selector  corresponding 
to  a  memoryless  strategy  a.  A  strategy  is  pure  memoryless  if  it  is  pure  and 
memoryless.  We  denote  by  S^,  S^,  the  family  of  pure,  finite-memory 
and  pure  memoryless  strategies  for  player  1  respectively.  Analogously  we 
define  the  families  of  strategies  for  player  2.  We  denote  by  S  and  If  the  set 
of  all  strategies  for  player  1  and  player  2,  respectively. 

Once  the  starting  state  s  and  the  strategies  a  and  tt  for  the  two  players 
have  been  chosen,  the  game  is  reduced  to  an  ordinary  stochastic  process. 
Hence,  the  probabilities  of  events  are  uniquely  defined,  where  an  event  A  C 

is  a  measurable  set  of  paths.  For  an  event  M  C  we  denote  by  Prg’'^(M) 
the  probability  that  a  path  belongs  to  A  when  the  game  starts  from  s  and 
the  players  follows  the  strategies  a  and  tt.  For  *  >  0,  we  also  denote  by 
@i  :  ^  S  the  random  variable  denoting  the  f-th  state  along  a  path. 

2.2  Objectives. 

An  objective  for  a  player  in  a  game  ^  is  a  set  W  C  of  infinite  paths.  We 
consider  the  following  objectives. 

•  Reachability  objective.  For  a  set  i?  C  S'  of  target  states,  the  Reacha¬ 
bility  objective  is  defined  as  Reach(ii!)  =  {  (sQ)  51,525 . . .)  G  |  3A:  G 
K  Sk  e  R  }. 
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•  Safety  objective.  For  a  set  F  C  S'  of  safe  states,  the  Safety  objective 
is  defined  as  Safe(F)  =  {  (sq,  si,  52,  •  •  •)  G  |  VA:  G  N.  Sk  G  F  }.  Note 
that  \  Reach(F)  =  Safe(S  \  R).  Hence  the  reachability  objective 
with  target  set  R  is  complementary  to  the  safety  objective  with  safe 
set  S\  R. 

•  Parity  objective.  Given  d  G  N,  we  write  [d]  for  the  set  {  0, 1,  2, . . .  ,  d  } 
and  [d]+  for  the  set  {  1,  2, . . .  ,  d  }.  Let  p  :  S  i-)-  [d]  be  a  function  that 
assigns  a  priority  p{s)  to  every  state  s  G  S,  where  d  G  N.  For  an 
infinite  path  u)  =  {sq,  si,  S2, . . .)  G  fi,  we  define  Inf(a;)  =  {  *  G  [d]  | 
Pi^k)  =  i  for  infinitely  many  A:  >  0  }.  The  parity  objective  is  defined 
as  Parity (p)  =  {a;  G  |  min  (lnf(a;))  is  even}.  Informally  we  say  that 
a  path  u)  satisfy  the  parity  objective,  Parity (p),  if  a;  G  Parity (p). 

The  ability  to  solve  games  with  Rabin-chain  (parity)  objectives  suffices 
for  solving  games  with  arbitrary  LTL  (or  w-regular)  objectives:  in  fact,  it 
suffices  to  encode  the  w-regular  objective  as  a  deterministic  Rabin-chain 
automaton,  solving  then  the  game  consisting  of  the  synchronous  product  of 
the  original  game  with  the  Rabin-chain  automaton  [16,  24], 

A  concurrent  nonzero-sum  parity  game  consists  of  a  game  structure  Q 
and  two  priority  function  pi  and  p2  for  player  1  and  player  2,  respectively. 
The  objective  of  player  1  and  player  2  are  Parity(pi)  and  Parity(p2),  re¬ 
spectively.  In  general  we  write  4^  for  a  arbitrary  parity  objective.  We  write 
the  objective  of  player  1  and  player  2  as  and  4^2,  respectively,  where 
and  4^2  are  arbitrary  w-regular  objective  formalized  as  a  parity  objective. 
We  also  use  to  denote  the  set  of  paths  a;  G  If  such  that  u)  G  Parity(pi). 
Similarly  we  write  4^2  to  denote  the  set  of  paths  Parity  (p2).  Given  a  parity 
objective  4^,  the  set  of  paths  4^  is  measurable  for  any  choice  of  strategies  for 
the  two  players  [28].  Given  a  state  s  we  write  to  denote  fig  G  and 
similarly  we  write  4'2s  to  denote  fig  G  4^2.  We  also  write  4^5  to  denote  fig 
Hence,  the  probability  that  a  path  satisfies  objective  4^  starting  from  state 
s  G  S'  under  strategies  cr,  tt  for  the  two  players  is  Prg’'^(4's). 

Concurrent  zero-sum  games.  A  concurrent  game  is  zero-sum  if  the 
objectives  of  the  players  are  complementary,  i.e.,  =  f2\4'2.  The  zero-sum 

values  for  the  players  in  a  zero-sum  concurrent  game  is  defined  as  follows. 

Definition  2  (Zero-sum  values)  Given  a  state  s  ^  S  we  call  the  maximal 
probability  with  which  player  1  can  ensure  that  holds  from  s  against  any 
strategy  of  player  2  is  the  zero-sum  value  of  player  1  at  s.  The  zero-sum 
value  for  player  2  is  defined  symmetrically.  Formally,  the  zero-sum  value 
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for  player  1  and  player  2  are  given  by  functions  :  5  i-)-  [0, 1]  and 

{{‘^))val{^2)  :  S'  !-)■  [0, 1],  defined  for  all  s  ^  S  by 


{W)vali^i)is)  =  sup  inf  Pr^’^(^is) 

o-es^ren 

P))vali^2)is)  =  sup  inf  Pr^’^(^2s). 

ttGU  o-es 


Concurrent  zero-sum  games  satisfy  a  quantitative  version  of  determinacy 
[15],  stating  that  for  all  parity  objective,  and  ^2  such  that  ^2, 

and  all  s  G  S,  we  have 

{{1)),«K^i)(5)  +  ((2)).«K^2)(5)  =  1. 

A  strategy  a  for  player  1  is  optimal  if  for  all  s  G  S  we  have 

inf  Pr^’-(^i,)  =  {{l)),«;(^i)(5). 
ttGII 

For  e  >  0,  a  strategy  a  for  player  1  is  s-optimal  if  for  all  s  G  S  we  have 

inf  Pr^’^(^i,)  >  {{l))„a;(^i)(s)  -  £• 
ttGII 

We  define  optimal  and  e-optimal  strategies  for  player  2  symmetrically.  Note 
that  the  quantitative  determinacy  of  concurrent  zero-sum  games  is  equiva¬ 
lent  to  the  existence  of  e-optimal  strategies  for  both  players  for  all  e  >  0,  at 
all  states  s  G  S. 

Definition  3  (Cooperative  value)  Given  an  objective  ^  we  define  the 
cooperative  value  of  the  game  as  the  maximal  probability  with  which  player  1 
and  player  2  can  cooperate  to  satisfy  the  objective  Formally,  the  cooper¬ 
ative  value  is  given  by  the  function  {{1,  2))„a;(^)  :  S  i-)-  [0, 1]  defined  for  all 
s  ^  S  by 

{{l,2)),am{s)  =  sup  Pr-’-(^,). 

((J,7r)GSxn 


Note  that  the  computation  of  the  cooperative  value  function  {{1,  2))„a;  (^) 
can  be  interpreted  as  the  computation  of  a  value  function  in  a  MDP  with 
objective  where  player  1  and  player  2  cooperatively  choose  strategies. 
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Definition  4  (e-Nash  equilibrium)  Let  Q  he  a  game  and  let  the  objec¬ 
tives  for  player  1  and  player  2  he  and  ^2,  respectively.  For  e  >  0,  a 
strategy  profile  (cr*,7r*)  G  S  x  11  is  an  e-Nash  equilibrium  for  a  state  s  ^  S 
iff  the  following  two  conditions  hold: 

Vct  G  S.  Pr^’^*  (^1,)  <  Prf (^1,)  +  e 

Vtt  G  n.  Prf ’^(^2.)  <  Prf ’^*(^2.)  +£• 

An  exact  Nash  equilibrium  is  an  0-Nash  equilibrium.  I 

It  may  be  noted  that  in  case  of  zero-sum  concurrent  games  with  parity 
objectives  optimal  strategies  need  not  exist,  and  only  existence  of  e-optimal 
strategies  can  be  guaranteed,  for  all  e  >  0.  Hence  in  the  general  case  of 
nonzero-sum  concurrent  games  with  parity  objectives  Nash  equilibrium  need 
not  exist,  and  existence  of  e-Nash  equilibrium,  for  all  e  >  0,  is  the  best  one 
can  achieve. 

2.3  The  branching  structure  of  plays 

Many  of  the  arguments  developed  in  this  paper  rely  on  a  detailed  analysis 
of  the  branching  process  resulting  from  the  strategies  chosen  by  the  players, 
and  from  the  probabilistic  transition  relation  of  the  game.  In  order  to  make 
our  arguments  precise,  we  need  some  definitions.  A  play  is  feasible  if  each  of 
its  transitions  could  have  arisen  according  to  the  transition  relation  of  the 
game. 

Definition  5  (Feasible  plays  and  outcomes)  Given  strategies  a  for 
player  1  and  n  for  player  2,  a  play  u)  =  {sq,  si,  S2, . . .)  is  feasible  in  a 
concurrent  game  graph  Q,  if  for  every  A:  G  N  the  following  conditions  hold: 

G  •s/c-i-i  G  Dest  (5/1; ,  ai ,  a2), 

2.  a{sQ,  si, . . . ,  sa:)('2i)  >  0;  and 

3.  7r(so,Sl,  •  •  .  ,SA:)(«2)  >  0. 

Given  strategies  cr  G  S  and  tt  G  H,  and  a  state  s,  we  denote  by 
Outcome(s,  cr,  tt)  C  the  set  of  feasible  plays  that  start  from  s,  given  strate¬ 
gies  a  and  n.  I 

In  order  to  make  precise  statements  about  the  branching  process  arising 
from  a  game  play,  we  define  below  trees  labeled  by  game  states. 
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Definition  6  (Infinite  trees,  S-labeled  trees  and  trees  for  events) 

An  infinite  tree  is  a  set  Tr  C  such  that 

•  if  X  ■  i  ^  At  where  rr  G  and  i  G  N  then  x  G  Tr; 

•  for  all  X  ^  Tr  there  exists  i  G  N  such  that  x  ■  i  ^  Tr.  We  refer  to  x  ■  i 

as  a  successor  of  x. 

We  call  the  elements  in  Tr  as  nodes  and  the  empty  word  e  is  the  root  of  the 
tree.  An  infinite  path  r  0/  Tr  is  a  set  r  C  Tr  such  that 

•  e  G  r; 

•  for  every  x  in  t  there  is  an  unique  i  G  N  such  that  x  -  i  G  r.  Note  that 

for  every  i  G  N,  there  is  an  unique  element  rr  G  r  such  that  \x\  =  i. 

We  denote  by  Ti  the  element  rr  G  r  such  that  \x\  =  i. 

Given  an  infinite  tree  Tr  and  a  node  x  G  Tr,  we  denote  by  Tr(rE)  the 
sub-tree  rooted  at  node  x.  Formally,  Tr(rE)  denotes  the  set  {  rr'  G  Tr  | 
X  is  a  prefix  of  x'  }. 

A  S-labeled  tree  T  is  a  pair  (Tr,  {•)),  where  Tr  is  a  tree  and  {•)  :  Tr  — )■  S' 
maps  each  node  of  Tr  to  a  state  s  ^  S.  Given  a  S -labelled  tree  T,  and  a 
infinite  path  r  C  Tr,  we  denote  by  (r)  the  play  (sq,  si,  S2,  •  •  •),  such  that 
So  =  (e)  and  for  all  i  >  0  we  have  Sj  =  (tj).  A  S-labeled  tree  Ts  =  (Tts,  {•)), 
where  (e)  =  s,  represents  a  set  of  infinite  paths,  denoted  as  C{Ts)  C  Q.g,  such 
that 


C{Ts)  =  {  a;  =  (so  =  s,  si,  S2,  •  •  •)  G  I  3t  C  TVs.  (r)  =  a;  }. 

A  S-labeled  tree  Ts  represents  an  event  A  ^  if  d'lT'd  only  if  C{Ts)  =  A. 
We  denote  by  Ta,s  «  S-labeled  tree  that  represents  an  event  A  C  and 
denote  by  Tr^^g  the  tree  ofTA,s-  ■ 

Several  of  the  following  results  will  be  phrased  in  terms  of  the  S-labeled  tree 
7^’J,  which  represents  the  outcomes  from  s  G  S  that  result  from  player  1 
using  strategy  a  and  player  2  using  strategy  tt,  and  that  belong  to  a  specified 
event  A. 

Definition  7  (Trees  for  strategies)  Given  a  measurable  event  A,  strate¬ 
gies  a,  TT,  a  state  s,  such  that  {A)  >  0,  we  denote  by  a  S-labeled 
tree  to  represent  AG  Outcome(s,  cr,  tt),  and  we  also  denote  by  Tr^’^  the 
tree  of  Given  strategy  cr,  tt,  we  denote  by  the  S-labeled  tree 

'^Outcomeis,a,^),s’  also  dcnotc  by  the  tree  ofTf^^.  I 
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Notations.  Let  T  =  (Tr,  {•))  be  a  S'-labeled  tree  and  rr  G  Tr  such  that 
\x\  =  n.  We  denote  by  Xi  the  prefix  of  x  of  length  i.  We  denote  by 
hist(rE)  =  ((e),  (rri), . . . ,  (rr^)),  the  history  represented  by  the  path  from 
root  to  the  node  x.  We  denote  by  Cone(rE)  =  {  a;  =  (sq,  si,  52,  •  •  • , )  | 

{xi)  =  Si  for  all  0  <  i  <  ra  }  the  set  of  paths  with  the  prefix  hist(rE).  Given 
a  measurable  event  .4  C  strategies  cr  and  tt  such  that  Prg’^(4^)  >  0, 

consider  the  S'-labeled  tree  to  represent  Ar\  Outcome(s,  cr,  tt).  Con¬ 
sider  the  event  AnU  =  {  Cone(rE)  |  x  G  Prg’'^(Cone(rE)  fl  4^)  =  0  }. 

Since  AnU  is  the  countable  union  of  measurable  sets  each  with  measure  0 
we  have  {AnU  H  4^)  =  0.  Hence,  in  sequel  without  loss  of  general¬ 
ity  given  any  event  A,  we  only  consider  the  event  A  \  AnU  and  by  a  lit¬ 
tle  abuse  of  notation  use  to  represent  the  stochastic  tree  . 

Hence,  without  loss  of  generality  we  assume  for  any  x  G  we  have 

Prff,7r(Cone(rr)n4^)  >  0.  Henceforth,  for  any  x  G  Tr^’^  we  write  PrJ’^(H  |  A) 
to  denote  Prg’^(H  |  Cone(rE),  4^). 

Definition  8  (Perennial  e-optimal  and  perennial  e-spoiling  strategies) 

For  a  parity  objective  4^,  for  s  >  0,  a  strategy  a  is  a  perennial  e-optimal 
strategy  for  player  1,  from  state  s,  with  respect  to  objective  4^  if  for 
all  strategy  n,  for  all  node  x  in  the  stochastic  tree  we  have 

stochastic  sub-trcc  rooted  at  x 
player  1  is  ensured  the  zero-sum  value  of  the  game  at  (x)  within  e-precision. 
Perennial  e-optimal  strategies  for  player  2  are  defined  analogously.  Given 
a  nonzero-sum  concurrent  game  with  objective  for  player  1  and  ^2  for 
player  2,  a  strategy  is  perennial  e-optimal  if  it  is  perennial  e-optimal 
with  respect  to  objective  and  a  strategy  is  perennial  e-spoiling  if  it 
is  perennial  e-optimal  with  respect  to  objective  4^2  =  \  ^'2.  Perennial 

e-optimal  and  perennial  e-spoiling  strategies  for  player  2  are  defined  simi¬ 
larly.  We  denote  by  and  H^  the  set  of  perennial  e-optimal  strategies  for 
player  1  and  player  2,  respectively.  Similarly,  we  denote  by  and  H^  the 
set  of  perennial  e-spoiling  strategies  for  player  1  and  player  2,  respectively. 


The  e-optimal  strategies  constructed  for  parity  objectives  in  [7]  are 
perennial  e-optimal  strategies.  This  gives  us  the  following  Proposition. 

Proposition  1  The  following  assertions  hold: 

1.  For  all  e  >  0,  we  have  /  0  and  H^  /  0. 

2.  For  all  e  >  0,  we  have  /  0  and  H^  /  0. 
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3  Single  strongly  connected  component  games 

In  this  section  we  prove  the  existence  of  e-Nash  equilibrium  for  every  e  >  0, 
in  a  subclass  of  concurrent  games,  namely,  single  strongly  connected  com¬ 
ponent  games.  In  the  next  section  we  generalize  the  existence  of  e-Nash 
equilibrium,  for  every  e  >  0,  to  all  concurrent  games  using  the  result  for 
single  strongly  connected  component  games.  Given  a  game  structure  Q  we 
define  a  underlying  graph  Gq  associated  with  Q. 

Definition  9  (Graph  of  a  game  Q')  Given  a  concurrent  game  structure 
Q  =  {S,  Moves,  ri,r2,S)  the  graph  of  game  Q  is  a  directed  graph  Gg  = 
{Sg,Eg)  that  is  defined  as  follows: 

•  Sg  =  S,  i.e.,  the  set  of  states  of  Gg  is  same  as  the  state  space  of  Q . 

•  Eg  =  {  {s,t)  \  B  ai  e  ri(s),  3  (22  G  1^2(5).  t  G  Dest(s,  01,02)  }■ 


Definition  10  (Single  strongly  connected  component  (Sscc)  games) 
Let  Q  he  a  concurrent  game  and  Gg  be  the  graph  of  Q.  We  call  Q  a  single 
strongly  connected  component  (Sscc)  game  if  the  graph  Gg  satisfy  the 
following  conditions: 

•  The  state  space  Sg  can  be  partitioned  into  three  sets:  G,U,T  = 

{  too,tio,toi,tu  }. 

•  G  is  a  strongly  connected  component  in  the  graph  Gg. 

•  The  states  tij  G  T  are  absorbing  states,  for  *,  j  G  {  0, 1  }.  The  priority 
function  for  the  states  in  T  are  as  follows:  pi{tij)  =  i  and  P2{tij)  =  j, 
for  i,j  G  {  0, 1  }.  Note  that  at  state  too  objective  of  both  the  players 
are  satisfied;  at  state  toi  only  player  1  ’s  objective  is  satisfied;  at  state 
tio  only  player  2’s  objective  is  satisfied  and  at  state  tn  none  of  the 
players  objective  is  satisfied. 

•  For  every  state  s  ^  U  we  have  |rj(s)|  =  1  for  *  G  {  1,2  }  and  ({  s  }  x 
Sg)  r\EgC{s}xT.  In  other  words,  at  states  in  U  there  is  no  non¬ 
trivial  choice  of  moves  for  the  players  and  thus  for  any  state  s  in  U 
the  game  proceeds  to  the  set  T  according  to  the  probability  distribution 
of  the  transition  relation  at  s. 

•  G  X  {Sg  \  G)  r\  Eg  C  G  X  U ,  i.e.,  the  edges  going  out  of  G  ends  at  a 
state  in  U. 
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Figure  1:  A  SSCC  game 


Figure  1  illustrates  a  Sscc  game.  I 

The  following  Proposition  states  that  if  existence  of  e-Nash  equilibrium 
is  established  at  a  state  s,  then  state  s  can  be  replaced  by  some  gadget  and 
to  prove  existence  of  e-Nash  equilibrium  in  original  game  it  suffices  to  prove 
existence  of  e-Nash  equilibrium  in  the  transformed  game  with  the  gadget 
replaced  for  state  s. 

Proposition  2  Let  Q  he  a  Sscc  game  with  objective  and  ^2  for  player  1 
and  player  2,  respectively.  Suppose  (cr*,7r*)  is  an  e-Nash  equilibrium  profile 
at  s,  with  e  — )■  0,  such  that  rEi(s)  =  Pr^  (^is)  (md  X2{s)  =  Pr^  (^2s)- 
The  game  graph  Q  can  be  transformed  to  a  game  graph  Q'  by  replacing  the 
state  s  with  the  following  gadget  (Figure  2)  such  that  if  there  is  an  e-Nash 
equilibrium  in  the  transformed  game  Q' ,  for  every  e  >  0,  then  there  is  an 
e-Nash  equilibrium  in  the  original  game  Q,  for  every  e  >  0.  The  gadget  is 
as  follows: 

•  Without  loss  of  generality  let  rEi(s)  <  X2(s)  (when  X2(s)  <  xi(s)  the 
gadget  is  symmetric).  Then  gadget  to  replace  s  is  as  follows:  ri(s)  = 
{  a  },  r2(s)  =  {b},  and 

5{s,a,b){too)  =  rEi(s),  5{s,a,b){tio)  =  X2(s)  -  xi(s), 

h(s,a,b)(tn)  =  1  -  X2(s),  h(s,  a,  b)(toi)  =  0 
where  Lj  are  as  defined  in  Definition  10. 

The  gadget  is  illustrated  in  figure  Fig  2.  The  construction  ensures  that 
at  state  s  the  set  {  toojioi  }  of  states  is  reached  with  probability  rEi(s),  i.e., 
player  1  ’s  objective  is  satisfied  with  probability  rEi(s),  and  the  set  {toofiio}  of 


14 


Gadget 


Figure  2:  The  gadget 

states  is  reached  with  probability  X2{s),  i.e.,  player  2’s  objective  is  satisfied 
with  probability  X2{s).  I 

The  result  follows  from  the  observation  that  player  1  and  player  2  can  switch 
to  strategies  (cr*,7r*)  when  the  game  reaches  s. 

Lemma  1  Let  Q  be  Sscc  game  with  objective  and  ^2  for  player  1  and 
player  2,  respectively.  If  any  of  the  following  four  properties  (Pl-Pj)  hold, 
then  for  every  s  >  0,  there  is  an  e-Nash  equilibrium  (cr*,7r*)  for  every  state 
s  G  C.  The  properties  (Pl-Pj)  are  as  follows: 

•  (PI)  There  is  a  state  s  G  C  such  that  {{1,  2))„a;  (^1  n  ^2)('S)  =  1. 

•  (P2)  There  is  a  state  s  G  C  such  that  {{l))uad^i)('S)  =  1- 

•  (P3)  There  is  a  state  s  G  C  such  that  {{2))yai{^2){s)  =  1. 

•  (Pj)  There  is  a  state  s  G  C  such  that  {{l))„a;(^i)(s)  =  0  and 

{{2)),,i{^2){s)=0. 

Proof.  The  proof  is  by  induction  on  the  size  of  C,  i.e.,  induction  on  \C\.  It 
is  trivial  for  the  base  case  when  \C\  =0.  We  now  prove  the  inductive  case: 

1.  Suppose  there  is  a  state  s  ^  C  such  that  {{1,  2))„a;  (^1  n  ^2)('S)  =  1, 
then  there  is  a  strategy  profile  (cr,  tt)  such  that  Prg’^(^is)  =  1  and 
Prg’^(^2s)  =  1-  Since  1  is  the  maximum  payoff  a  player  can  achieve, 
clearly  (cr,  tt)  is  a  Nash  equilibrium.  By  Proposition  2  we  can  replace  s 
by  the  gadget  as  described  in  Proposition  2.  This  breaks  C  into  smaller 
strongly  connected  components.  We  can  then  apply  the  induction  hy¬ 
pothesis  on  the  smaller  strongly  connected  components  in  a  bottom-up 
order.  The  idea  is  as  follows:  consider  a  strongly  connected  component 
C"  C  C  in  the  game  where  s  is  replaced  by  the  gadget  of  Proposition  2. 
By  inductive  hypothesis  it  follows  that  for  every  strongly  connected 
component  Ci  C  C  such  that  Ci  is  lower  than  C  (i.e.,  there  is  a  path 
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from  some  state  in  C  to  a  state  in  Ci  in  the  graph  of  the  transformed 
game),  e-Nash  equilibrium  exists  for  every  state  si  G  Ci.  This  follows 
by  induction  hypothesis  since  ICil  <  ICI-  By  Proposition  2  every  state 
Si  G  Cl  can  be  replaced  by  the  gadget  of  Proposition  2.  Hence  the 
strongly  connected  component  C  and  the  set  of  strongly  connected 
components  lower  than  C  replaced  by  the  gadget,  form  a  Sscc  game. 
Since  \C'\  <  \C\,  by  induction  hypothesis  on  C  there  exists  e-Nash 
equilibrium  (hence  also  e-Nash  equilibrium  values)  from  every  state 
s'  G  C .  Then  by  applying  Proposition  2  we  can  replace  each  state 
s'  G  C  by  the  gadget  as  described  in  Proposition  2  and  proceed. 

2.  Suppose  there  is  a  state  s  G  C  such  that  {{l))uai(^i)(s)  =  1.  Then 
for  every  e  >  0,  there  is  an  e-optimal  strategy  (7^  for  player  1  such 
that  infjrgn  Pi’s®’^(^is)  >  1  —  £•  Consider  a  strategy  tt*  such  that 
Pr^®’^  (^2s)  >  sup^gn  Pi’s®’^(^2s)  —  £•  In  other  words  we  fix  an  e- 
optimal  strategy  (7^  for  player  1  and  a  strategy  tt*  for  player  2  that 
ensures  player  2  the  maximal  probability  to  satisfy  ^2  against  the 
strategy  cig,  within  e-precision.  Thus  we  have 

supPr^’^*(^i,)  <  1  <  Pr^"’^*(^i,)  +e; 

(tGE 

supPr^®’^(^2s)  <  Prs®’""*(^2s)  +£• 

ttGII 

Hence  tt*)  is  an  e-Nash  equilibrium.  Hence  we  can  fix  the  value  of 
e-Nash  equilibrium  at  state  s  and  then  the  argument  to  prove  that  e- 
Nash  equilibrium  exists  for  every  state  in  C  follows  from  the  induction 
hypothesis  and  Proposition  2  as  described  earlier.  The  proof  for  the 
case  when  we  have  a  state  s  such  that  {{2))„a;(^2)('S)  =  1  is  symmetric. 

3.  Suppose  there  is  a  state  s  G  C  such  that  {{l))uai(^i)(s)  =  0  and 
{{2))uai(^2)('S)  =  0.  Then  consider  e-spoiling  strategy  pair  (crg,?!^)  G 
Sg  X  Hg.  Since  is  an  e-spoiling  strategy  it  follows  that 

sup  Prg®’^(^2s)  <  £■ 

ttGII 

Similarly,  since  is  an  e-spoiling  strategy  we  have 

sup  Pr^’^"(^is)  <  e. 

(tGE 

Hence  (cr£,7r£)  is  an  e-Nash  equilibrium  at  s.  The  argument  to  prove 
that  there  is  an  e-Nash  equilibrium  at  every  state  in  C  follows  from 
argument  similar  to  the  previous  cases. 
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The  desired  result  follows.  I 

In  the  next  sub-section  we  show  that  if  the  four  properties  (P1-P4)  of 
Lemma  1  are  not  satisfied  then  the  nonzero-sum  SSCC  game  with  par¬ 
ity  objectives  can  be  reduced  to  a  nonzero-sum  game  with  reachability 
objectives  with  some  desired  properties.  The  reachability  objectives  are 
Reach({  toojioi  })  for  player  1  and  Reach({  toojiio  })  for  player  2.  We  then 
establish  the  existence  of  e-Nash  equilibrium,  for  every  e  >  0,  in  the  original 
game  by  the  use  of  punishing  or  spoiling  strategies. 

3.1  Non-zero  sum  reachability  game 

Let  Wi  =  {  too,toi  }  and  W2  =  {  toojtio  }•  We  consider  the  nonzero-sum 
reachability  game  Qr  such  that  the  objective  for  player  1  is  to  reach  Wi, 
i.e.,  Reach(ITi),  and  objective  for  player  2  is  to  reach  W2,  i.e.,  Reach(IT2). 

Lemma  2  to  Lemma  4  were  proved  in  [2];  we  present  the  proofs  for  sake  of 
completeness. 

In  sequel,  we  consider  stochastic  trees  such  that  Prg’^(./l)  >  0. 

Given  a  stochastic  tree  Tj’j,  let  k  be  a  subset  of  nodes,  i.e.,  k  C  Tr^’^. 
Analogous  to  the  definition  of  reachability  and  safety  we  define  the  following 
notions  of  reachability  and  safety  in  the  stochastic  tree: 

1.  Reachability  in  tree.  For  a  set  k  C  Tr^’^,  let 

ReachTree(K)  =  {(r)  |  r  is  an  infinite  path  in  Tr^’^  such  that  Bi  G  N.  r*  G  k}, 
denote  the  set  of  paths  that  reach  the  subset  k  of  nodes. 

2.  Safety  in  tree.  For  a  set  k  C  Tr^’^,  let 

SafeTree(K)  =  {(r)  |  r  is  an  infinite  path  in  Tr^’^  such  that  V*  G  N.  r*  G  k}, 

denote  the  set  of  paths  that  stay  safe  in  the  subset  k  of  nodes. 

Given  a  positive  integer  k  and  a  set  k  C  Tr^’^,  we  define  by  ReachTree^(K)  = 

{  (r)  I  3  rr  G  T.  3  *  <  A:,  rcj  G  K  },  i.e.,  the  set  of  paths  that  reaches  k  within 
k  steps. 

Lemma  2  (Reachability  Lemma)  Let  be  a  stochastic  tree. 

1.  For  a  set  k  C  Tr^’^,  if  ini {ReachTree{K)  |  A.)  >  0,  then 
{ReachTree{K)  \  A)  =  1,  for  all  nodes  x  G  Tr^’^. 
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2.  For  a  set  B  C  S,  if  Fxf:'"  {Reach{B)  |  ^)  >  0,  then 

{Reach{B)  \  A)  =  1,  for  all  nodes  x  G 

Proof.  We  prove  the  first  case  and  show  that  the  second  case  is  an  imme¬ 
diate  consequence. 

1.  Let  0  <  c  <  inf^grj^CT.Tr  PrJ’^(ReachTree(K)  |  A).  Chose  0  <  c'  <  c.  For 

every  node  x  €  Tr^’^,  there  exists  kx  such  that  PrJ’^(ReachTree^“' (k)  | 
A)  >  c' .  Consider  ki  =  k^:  (recall  that  e  is  the  root  of  the  tree)  and 
consider  the  frontier  Fi  of  Tr^’^  at  depth  ki.  Given  a  frontier  F  at 
depth  k,  let  F  be  the  set  of  nodes  rr  in  F  such  that  the  path  from  the 
root  to  X  has  not  visited  a  node  in  k,  i.e.,  none  of  e, rEi,rE2,  •  •  •  ■,x\x\  is 
in  K.  For  a  frontier  Fi,  define  ki^i  =  max{A:a;  |  x  G  Fi}.  Inductively, 
define  the  frontier  Fj+i  at  depth  I*  follows  that  for  k  = 

Prg’^(fi  \  ReachTree^(K)  |  .A)  <  (1  —  c')”.  Since 
limjj_).oo(l  —  c')”  =  0,  the  desired  result  follows  for  the  root  of  the 
tree.  Since  inf,j,grpj.^7r  PrJ’^(ReachTree(K)  |  A.)  >  0,  it  follows  that  for 
all  node  x  G  Tr^’^  we  have  inf,j,^grjYCT,7r^^^  PrJ(^(ReachTree(K)  |  A.)  >  0. 
Arguing  similarly  for  the  subtree  rooted  at  the  node  x  the  desired 
result  follows. 
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2.  Observe  that  with  k  =  {  rr  G  Tr^’^  \  {x)  ^  B  },  we  have  Reach(i?)  = 
ReachTree(K).  The  result  is  immediate  from  part  1.  I 


Notations.  Let  ^  C  be  a  measurable  event  such  that  Prg’^(^)  >  0. 
For  a  set  R  C  5,  let  InfSet(R)  =  {a;  |  Inf(a;)  C  R}.  For  a  set  R  C  S,  let 
InfSetEq(R)  =  {a;  |  Inf(a;)  =  R}.  Given  a  node  x  in  Tr^’^,  and  e  >  0,  we 
define  C^^(x)  as  follows: 

C^^Jx)  =  {  R  C  5  I  PrJ’^(InfSet(R)  |  .4)  >  1  -  e  }. 


Note  that  for  ei  >  0  and  S2  >  0,  such  that  ei  <  £2,  for  all  node  x  G 


ifRGC-  (a:) 


then  R  G  (x) 

.4, £2 


We  define  by  C^^(x) 


=  lim£^oC^’£(a;) 


The  monotonicity  property  of  with  respect  to  e  ensures  that  Cj 
exists  for  all  x  G  Tr'^’^ 


A, S' 


Lemma  3  For  every  node  x  G  Tr^’^,  there  is  a  unique  minimal  element  of 
C^^(x)  under  the  C  ordering. 


Proof.  Consider  a  node  x  G  Tr^’^.  Let  Ri  and  R2  be  two  distinct  minimal 
elements  in  C^^(x).  Consider  any  arbitrary  e  >  0.  It  follows  from  the 
definition  that  we  have  PrJ’^(InfSet(Rj)  |  ^)  >  1  —  for  *  G  {  1,  2  }.  By 
definition  we  must  have  PrJ’^(InfSet(Ri  U  R2)  |  .4)  <  1.  Hence  we  have  the 
following  equation: 


PrJ’^(InfSet(Ri)  |  ^)+PrJ’^(InfSet(R2)  |  .4)-PrJ’^((InfSet(RinR2))  |  .4)  <  1 

Hence  it  follows  that  PrJ’^((InfSet(Ri  fl  R2))  |  .4)  >  1  —  e.  Hence  for  every 
e  >  0,  we  have  PrJ’^(InfSet(Ri  nR2)  |  .4)  >  1  — e.  Hence,  RinR2  G  C^^(x). 
However,  this  is  a  contradiction  to  the  assumption  that  Ri  and  R2  are 
distinct  minimal  elements  of  C^^(x).  I 

We  define  the  function  :  Tr^’^  — )■  2'^  that  assigns  to  every  node 

X  G  Tr^’^  the  minimum  element  of  C^^(x).  Formally,  we  have 

MyM=  fl  B  =  lim  f  B. 

B&CY(x)  BeC^A^si^) 


Proposition  3  For  every  x  G  Tr^’^,  for  every  successor  xi  of  x  we  have 

MXixi)  C  MYix). 

Proof.  By  definition  for  all  nodes  x,xi  G  Tr^’^,  such  that  rri  is  a  successor 
of  X  we  have  C^’^(rEi)  C  C^^(x).  The  result  is  an  easy  consequence  of  the 
above  fact.  I 
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Lemma  4  Given  a  S-labeled  tree  for  all  node  x  G  for  all  s  >  0, 

there  is  a  set  B  C  S  and  xi  G  Tr^’^(rE),  such  that  for  all  node  X2  G  Tr^’^(rEi) 
we  have 

Prjf  (InfBetEq(S)  |  ^)  >  1  -  e. 

Proof.  The  proof  is  by  induction  on  (x)\. 

Base  Case.  If  (x)\  =  1,  let  (x)  =  {s}.  Then  for  all  nodes  xi  G 

Tr^’^(rE)  we  have  PrJ’j^^(InfSet({s})  |  .A)  >  1  —  e,  for  all  e  >  0.  Thus  for  all 
nodes  xi  G  Tr^’^(rE),  for  all  e  >  0,  we  have  Pr^f  (InfSetEq({s})  \  A)  >l  —  e. 
Inductive  Case.  Suppose  there  exist  a  node  xi  G  Tr^’^(rE)  such  that 

(xi)  C  A4'^^{x),  then  {xi)\  <  {x)\  and  the  result  follows 

by  inductive  hypothesis  at  Xi.  Otherwise  for  every  node  xi  G  Tr^’^(rE) 
we  have  (xi)  =  M.‘^J^{x).  Let  the  set  (x)  be  B.  We  have 

linig^o  flxiGTr^^rA^)  (  D)  =  B. 

•  Suppose  we  have  inf^^grpj.CT,7r  Prf:f^ (Reach({s})  |  >  0,  for  all  states 

s  ^  B.  Then  it  follows  from  Lemma  2  that  for  all  nodes  xi  G  Tr^’^(rE) 

we  have  PrJ’j^^(Reach({  s  })  |  =  1.  Hence  for  all  nodes  xi  G  Tr^’^(rE) 

we  have  PrJ’j^^(InfSetEq(R)  |  =  1. 

•  Otherwise,  consider  a  state  s  ^  B  such  that 

inf^^grpj.^7r^^^  PrJ’^^(Reach({  s  })  |  =  0.  Hence  it  follows, 

for  every  e  >  0,  there  is  a  node  xi  G  Tr^’^(rE)  such  that 
PrJ’j^^(InfSet(H  \  {  5  })  |  A)  >  1  —  e.  Eormally,  we  have 

lim^^oflxieTY-’-W  {C\DeCXM)D)  ^  \  {  s  }.  This  is  a  con¬ 

tradiction  to  the  fact  that  for  all  nodes  xi  G  Tr^’^(rE)  we  have 
M^ixi)  =  B  (i.e.,  lim,^onxieTY^,’^,(x)  iClDeCXJxi) 

The  desired  result  follows.  I 

Lemma  5  Given  a  stochastic  tree  Tj’j,  for  all  node  x  G  Tr^’^,  for  every 
e  >  0,  there  is  a  node  xi  G  Tr^’^(rE)  such  that  for  all  node  X2  G  Tr^’^(rEi) 
one  of  the  following  conditions  (Cl-Cf)  hold: 

1.  (Cl)  Pr^'f^i^is  I  .4)  >  1  -  e  and  Pr^'f^{^2.s  |  -A)  >  1  -  e; 

2.  (C2)  Pvf.^f{^is  I  A.)  >  1  -  e  and  Pvf.^f{^2s  \  A)  <  e; 

3.  (C3)  Pr^'f^i^is  I  A)  <  e  and  Pr^'f^{^2.s  |  A)  >  1  -  e; 

4.  (C4)  Pxf:(^{^is  I  A)  <  e  and  Pxf:(^{^2s  |  A)  <  e. 
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Proof.  It  follows  from  Lemma  4  that  for  every  e  >  0,  there  is  a  node 
xi  G  Tr^’^(rE),  and  a  set  B  such  that  for  all  node  X2  G  Tr^’^(rEi)  we  have 
Prjf  (InfSetEq(S)  |  A)  >  1  —  e.  The  following  case  analysis  proves  the 
result: 

1.  If  min(pi(S))  is  even  and  min(p2(-B))  is  even  then  condition  1  (Cl)  is 
satisfied. 

2.  If  min(pi(S))  is  even  and  min(p2(.B))  is  odd  then  condition  2  (C2)  is 
satisfied. 

3.  If  min(pi(S))  is  odd  and  min(p2(.B))  is  even  then  condition  3  (C3)  is 
satisfied. 

4.  If  min(pi(S))  is  odd  and  min(p2(.B))  is  odd  then  condition  4  (C4)  is 
satisfied. 

Hence,  it  also  follows  that  for  every  stochastic  tree  Tr^’^  for  all  node  x  G 
Tr^’^,  for  every  e  >  0,  there  is  a  node  xi  G  Tr^’^(rE)  such  that  for  all  node 
X2  G  Tr5^(rEi)  either  max{PrJ;J"(4'is  |  {^2s  \  -4)}  >  1  -  e;  or 

min{PrJ;J'(4'is  |  A),Pr'^'^{^2s  \  A)}  <  e.  ■ 


Punishing  perennial  e-optimal  strategy  construction.  We  consider 
punishing  perennial  e-optimal  strategy  profile  (CTgjTTg)  that  are  defined  as 
follows: 


(■SOj  ■Sl  J  •  •  •  J  5/;) 


Cr£(so,Sl,  ■■■,Sk)  if  {mval{^l){Sk)  >  0 
CT£(so,S1,  ■■■,Sk)  if  {mval{^l){Sk)  =  0 


where  G  and  G  S^.  That  is  player  1  follows  a  perennial  e-optimal 
strategy  when  the  play  is  in  a  state  with  positive  zero-sum  value  for 
player  1;  else  it  follows  a  perennial  e-spoiling  strategy  a^-  It  is  easy  to 
observe  that  since  G  we  have  G  S^.  Similarly  we  define  the  strategy 
TTg  as  follows: 


(■SOj  ■Si  j  •  •  •  j  ■§/;) 


(sQj  Si  ,  .  .  .  ,  S/;  ) 
(sqj  Si , . . . ,  s/;) 


if  {{2))yal{^2){sk)  >  0 

if  {{2))„«;(^2)(s;fc)  =0 


where  G  H^  and  G  H^. 


Lemma  6  Let  (cr,  tt)  G  S  x  H  be  an  arbitrary  strategy  profile,  and  let  k  = 
{  rr  G  Tr^’^  |  PrJ’^(5a/e(C'))  >  0  }.  For  all  node  x  G  Tr^’^  the  following 
assertions  hold: 
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1.  P<’-(5a/e(C))  =P<’-(5a/eTree(K)). 

2.  // PrJ’^(5a/e(C'))  >  0,  then  for  every  ?7  >  0,  there  exists  xi  G  Trg’^(rE) 
such  that  Pi {SafeTree{K))  >  1  —  ij. 

Proof. 


1.  For  every  node  rri  G  k  =  (Tr^’^  \  k)  we  have  PrJ’j^^(Reach([/'))  = 
1.  Hence  for  all  node  xi  G  Trg’^(rE)  we  have  PrJ’^^ (Reach (17)  | 
ReachTree(K))  =  1,  i.e.,  PrJ’^(Safe(C')  |  ReachTree(K))  =  0.  For  every 
node  xi  G  k,  since  PrJ’j^^(Safe(C'))  >  0,  we  have  (xi)  G  C.  Thus  for 
every  node  x  we  have  PrJ’^(Safe(C')  |  SafeTree(K))  =  1.  Hence  we 
have 


PrJ’^(Safe(C)) 


=  PrJ’^(Safe(C')  |  SafeTree(K))  •  PrJ’^(SafeTree(K)) 

+  PrJ’^(Safe(C')  |  ReachTree(K))  •  PrJ’^(ReachTree(K)) 
=  PrJ’^(Safe(C')  |  SafeTree(K))  •  PrJ’^(SafeTree(K)) 

=  PrJ’^(SafeTree(K)) 


The  desired  result  follows. 


2.  It  follows  from  above  that  for  all  node  x  if  PrJ’^(Safe(C'))  >  0, 
then  PrJ’^(SafeTree(K))  >  0.  Hence  we  must  have 

inf^^grjYj.Tr^^^  PrJ’^(ReachTree(K))  =  0;  otherwise,  if 

inf,j,^grpj.CT,7r^^^  PrJ’^(ReachTree(K))  >  0,  then  it  follows  from 

Lemma  2  that  PrJ’^(ReachTree(K))  =  1,  i.e.,  PrJ’^(SafeTree(K))  = 
0.  Since  inf,j,^grjYCT,7r^^^  PrJ’^^(ReachTree(K))  =  0  we  have 

PrJ’^^(SafeTree(K))  =  1.  Hence  for  all  ?7  >  0,  there 
exists  xi  G  Tif.''^{x)  such  that  PrJ’j^^(SafeTree(K))  >  1  —  ??. 


Lemma  7  Let  x  be  a  node  in  the  stochastic  tree  Tr^’^,  and  ij  >  0,  and  A 
be  an  event  such  that  PrJ’^(7l)  >  1  —  ij.  For  all  objective  ^  the  following 
assertions  hold: 

1.  //PrJ’^(^  \  A)  >  1  —  s,  then  PrJ’^(^)  >  1  —  s  —  rj. 

2.  7/PrJ’^(^  \  A)  <£,  then  PrJ’^(^)  <  e  +  ??. 

Proof. 
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1.  If  PrJ’^(^  \  A)>  I  —  e,  then 

PrJ’^(^)  >  PrJ’^(^n^) 

=  P<’^(^  I  ^)  •  PrJ’^(^) 

>  (1  —  e)-(l  —  ?7)  =  1  —  e  —  ?7  +  ?7-e>l  —  e  —  77. 

2.  If  PrJ’^(^  \  A)  <  s,  then 

PrJ’^  (^)  =  PrJ’^  n  ^)  +  PrJ’^  n  A) 

<  PrJ’^i^  I  A)  ■  P<’^(^)  +  PrJ’^M) 

<  e  +  77. 


Hence  the  Lemma  follows.  I 

Lemma  8  Suppose  properties  (PI-P4)  of  Lemma  1  are  not  satisfied.  For  all 
state  s  ^  S,  we  have  Pi {Reach{U))  =  1,  where  and  are  punishing 
perennial  s-optimal  strategies  and  e  — )■  0. 

Proof.  Let 


=  min{  {{l)).«7(^i)(5)  |  5  G  5,  {{l)).«7(^i)(5)  >  0  } 

denote  the  least  positive  zero-sum  value  for  player  1  for  a  state  s  G  5;  and 

=  max{  {{l))„a;(^i)(s)  |  s  G  5,  {{l))«a;(^i)(s)  <  1  } 

denote  the  greatest  zero-sum  value  for  player  1  that  is  less  than  1  for  a  state 
s  G  5.  Similarly,  let 

=  min{  {{2))yal{^2){s)  |  s  G  5,  {{2))yal{^2){s)  >  0  } 

denote  the  least  positive  zero-sum  value  for  player  2  for  a  state  s  G  5;  and 

=  max{  {{2))yal{^2){s)  |  s  G  5,  {{2))yal{^2){s)  <  1  } 

denote  the  greatest  zero-sum  value  for  player  2  that  is  less  than  1  for  a  state 
s  G  5.  Let 

=  max{  {{1,  2))„«;  (^1  n  ^2)(5)  |  5  G  C,  {{1,  2))„«;  (^1  n  ^2)(5)  <  1  } 

denote  the  greatest  cooperative  value  with  objective  fl  ^2  that  is  less 
than  1,  for  a  state  s  G  C.  Let  a  =  min{  0™“,  q;™“,  1  —  1  —  1  — 
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q;™^2)  }•  Note  that  0  <  fv  <  ^.  Fix  j3  such  that  0  <  6/3  <  a,  and  fix  rj  and  e 
such  that  0  <  e  <  ?7  <  /3^. 

We  consider  the  perennial  e-optimal  strategy  profile  (ct^,  tt^)  as  described 
by  punishing  perennial  e-optimal  strategy  construction.  Let  =  {  rr  G 

I  PrJ®’^®(Safe(C'))  >  0  }.  If  is  empty  the  Lemma  follows. 

Assume  for  the  sake  of  contradiction  that  is  non-empty. 

Let  X  G  i.e.,  PrJ®’^®(Safe(C'))  >  0.  Let  k  =  {  rri  G  | 

PrJ®’^®(Safe(C'))  >  0}.  Since  PrJ®’^®  (Safe(C'))  >  0  ,  it  follows  from  Lemma  6 
that  PrJ®’^®(SafeTree(K))  >  0.  Consider  the  event  A  =  SafeTree(K).  Let 
xi  G  Tr'^f^(x)  such  that  one  of  the  conditions  (C1-C4)  of  Lemma  5  are 
satisfied  for  e,  i.e.,  for  every  X2  G  Tr'^f^(xi)  one  of  the  conditions  (C1-C4) 
of  Lemma  5  hold  for  e.  Since  A  =  SafeTree(K),  and  xi  G  Tr'^f^(x)  we 
have  xi  G  k.  Hence  PrJ®’^®(Safe(C'))  >  0,  and  it  follows  from  Lemma  6  that 
PrJ®’^®(SafeTree(K))  >  0.  Again  it  follows  from  Lemma  6  that  there  is  a 
node  x^  G  Tr'^’^^(xi)  such  that  PrJ®’^®(SafeTree(K))  >  1  —  rj  and  also  x^ 
satisfies  one  of  the  conditions  (C1-C4)  for  e.  We  analyze  the  following  four 
cases: 

1.  If  condition  (Cl)  of  Lemma  5  holds,  then  we  have  PrJ®’^®(4'i  | 
SafeTree(K))  >  1  —  e  and  PrJ®’^®(4'2  |  SafeTree(K))  >  1  —  e.  Since 
PrJ®’^®(SafeTree(K))  >  1  —  rj,  from  Lemma  7  we  have  PrJ®’^®(4'i)  > 
1  —  e  —  ?7>1  —  2?7  and  PrJ®’^®(4'2)  >1  —  s  —  r]>l  —  2r].  It  follows 
that  we  have  PrJ®’^®(4'i  n  4^2)  >  1  —  4?7  >  1  —  4/3^  >  1  —  4/3.  Let 
(x^)  =  Si  G  C,  and  consider  the  strategy  pair  (a,  n)  defined  as  follows: 

ct(so.  Si,  . . . ,  sa;)  =  CT£(hist(rE3),  So,  si, . . . ,  Sk) 

and 

i(so,  Si, . . . ,  SA;)  =  7f£(hist(rE3),  So,  si, . . . ,  Sk) 

i.e.,  the  strategies  follows  and  from  x^.  Hence  Prg^^(4'i  n  4^2)  = 
PrJ®’^®(4'i  n  4^2)  >  1  —  4/3  >  1  —  q;  >  oc^^y  Hence  we  must 
have  sup(^,r)£j]xn  C  ^^2)  =  1  and  thus  the  property  (PI)  of 

Lemma  1  is  satisfied. 

2.  If  condition  (C2)  of  Lemma  5  holds,  then  we  have  PrJ®’^®(4'i  | 
SafeTree(K))  >  1  —  e  and  PrJ®’^®(4'2  |  SafeTree(K))  <  e.  Since 
PrJ®’^®(SafeTree(K))  >  1  —  ??,  from  Lemma  7  we  have  PrJ®’^®(4'i)  > 
1  —  e  —  ?7>1  —  2?7  and  PrJ®’^®(4'2)  <  e  +  ?/  <  277.  Let  Wo  =  {  s  G 
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S  I  {{2))„a;(^2)('S)  =  0  }  denote  the  set  of  states  where  the  zero-sum 
value  for  player  2  is  0;  and  let  Wq  =  S  \  Wq.  Note  that  for  every  state 
s  G  Wq  we  have  {{2))„a;(^2)('S)  >  q;™“  >  a.  Then  we  have 

2?7  >  PrJ®’^®(^2)  >  PrJ®’^®(^2  H  Reach(VPo)) 

=  PrJ®’^®(^2  I  Reach(VPo))  •  PrJ®’^®  (Reach (VPo)) 

>  (fv  —  e)  •  PrJ®’^®(Reach(VPo))  (since  G  S^) 
Since  6/3  <  o;  and  e  <  ?7  <  /3^  <  /3  we  have 

Prg.*-(R«ach(T^))  <  ^  <  I  <  ^  <  f  <  /,. 

Hence  PrJ®’^®  (Safe(VPo))  >  1  —  /3.  By  construction  of  if  the  current 
state  of  the  play  is  in  Wq  then  player  2  follows  an  e-spoiling  strategy 
Tfg  G  Hg.  Let  PrJ®’^®(^i)  =  p.  Then  we  have 


1  -  2?7  <  PrJ®’^®(^i)  <  (1  -  /3)p  +  /3. 

Since  r]<  (3“^  <l3<a<^  we  have 

1  -  2?7  -  /3  1-3/3  2/3 

1-/3  -1-/3-  1-/3-  ^ 


Hence  we  have  PrJ®’^®(4'i)  >  1  —  4/3  >  1  — 6/3+e  >  l  —  a+£  >  a™^^+s. 
Since  Tfg  is  an  e-spoiling  strategy  it  follows  that  {{l))„a;(4'i)({rE3))  =  1. 
Since  (x^)  G  C,  the  property  (P2)  of  Lemma  1  holds. 

3.  Argument  similar  to  previous  case  shows  that  if  condition  (C3)  of 
Lemma  5  holds,  then  the  property  (P3)  of  Lemma  1  holds. 

4.  If  condition  (C4)  of  Lemma  5  holds,  then  we  have  PrJ®’^®(4'i  | 
SafeTree(K))  <  e  and  PrJ®’^®(4'2  |  SafeTree(K))  <  e.  Since 
PrJ®’^®(SafeTree(K))  >  1  —  ??,  from  Lemma  7  we  have  PrJ®’^®(4'i)  < 
£  +  T]  <  2p  and  PrJ®’^®(4'2)  <£  +  ??<  2p.  Since  is  perennial 
e-optimal  strategy  and 

<  2?7  <  3?7  —  e  <  3/3  —  e  <  q;™“  —  e, 

it  follows  that  {{l))„a;(4'i)({rE3))  =  0.  Similarly  we  also  have  that 
{{2))„a;(4'2)({rr3))  =  0.  Since  {rE3)  G  C,  the  property  (P4)  of  Lemma  1 
holds. 
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Since  by  assumption  of  the  Lemma  properties  (P1-P4)  of  Lemma  1  are 
not  satisfied,  we  have  a  contradiction.  Hence  =  0.  The  Lemma 

follows.  I 

Definition  11  (Locally  optimal  strategy)  A  selector  function  is  lo¬ 
cally  optimal  if  it  is  optimal  in  the  “one- step”  matrix  game  where  each  state 
is  assigned  a  reward  value  {{l))„a;(4'i)(s).  Formally,  for  all  state  s,  for  all 
move  02  G  r2('S)  we  have 

{{l)),ali^l)is)  <  E[{{l)),«;(^l)(0l)  I  5,d(5),«2]. 

Locally  optimal  selector  fo'’’  player  2  is  defined  symmetrically.  We  denote 
by  A[  and  he  the  set  of  locally  optimal  selectors  for  player  1  and  player  2, 
respectively.  I 

The  following  Lemma  is  an  easy  consequence  of  Lemma  8  and  Theorem  3 
of  [2]. 

Lemma  9  For  every  e  >  0,  there  exists  perennial  s- optimal  strategy  profile 
(cTgjTTg)  G  Sg  X  Hg  and  there  exists  locally  optimal  selector  {o,W)  ^  k\x  kf^ 
such  that  the  following  conditions  hold: 

1.  lim^^o  =  a;  lim^^o  =  F. 

2.  For  all  state  s,  ''^^{Reach{U))  =  1. 

In  sequel  by  a  little  abuse  of  notation  we  also  denote  by  a  and  n  the 
memoryless  strategies  constructed  from  the  locally  optimal  selectors  a  and 
n  of  Lemma  9. 

Notation.  For  notational  simplicity  we  denote  by  ui(s)  and  V2{s)  the 
zero-sum  values  of  the  games,  i.e.,  ui(s)  =  {{l))uai(4'i)(s)  and  V2{s)  = 
{{2)),,i{^2){s). 

1.  For  a  play  u)  =  (sq,  si,  S2, . . .)  we  write  eg(a;)  =  inf{  n  >  1  |  ^  C  }, 

to  denote  the  first  time  the  play  leaves  C. 

2.  For  a  memoryless  strategy  x  of  player  1  we  denote  by  Xg  the  distribu¬ 
tion  described  by  the  strategy  x  at  state  s.  Similar  notations  are  used 
for  memory  less  strategies  y  of  player  2. 

3.  For  a  memoryless  strategy  y  for  player  2  we  define  Hi{y,C)  = 

maxaeri(s)  max^^g  E[ui(0i)  |  s,a,ys\. 
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4.  For  a  memory  less  strategy  x  for  player  1  we  define  H2  {x,C)  = 

max6gr2(s)  max^^g  E['U2(©i)  |  s,b,Xs\. 

Definition  12  (Perturbation)  Let  jj,  and  //  be  two  distribution  over  S, 
then  Ji  is  a  perturbation  of  n  i/ Supp(/L/)  C  Supp(//).  I 

Definition  13  (Perturbed  graph  [29])  Given  a  pair  of  memoryless 
strategies  {x,y)  and  a  subset  CCS,  the  perturbed  graph  GQ{x,y)  is  a 
directed  graph  defined  as  follows: 

•  the  set  of  states  is  C; 

•  for  s,  s'  G  C,  there  is  an  edge  (s,s')  if  there  exists  perturbation  (xs,ys) 
of  {xs,ys)  such  that  5{s'  \  s,  Xs,ys)  >  0  and  5{C  \  s,  Xs,ys)  =  1- 

Intuitively,  the  definition  captures  the  idea  that  the  players  can  play  pertur¬ 
bation  of  {x,  y)  to  reach  from  s  to  s'  without  leaving  the  set  C.  I 

Definition  14  (Weak-communicating  sets  [23,  29])  Let  {x,y)  be  a 
pair  of  memoryless  strategies.  A  set  C  C  S  is  weak- communicating  under 
{x,y)  if  the  graph  GQ{x,y)  is  strongly  connected.  I 

Intuitively,  weak-communicating  set  C  under  {x,  y)  means  that  playing  per¬ 
turbation  {x,  y)  of  {x,  y)  every  state  s  G  C  is  reached  almost-surely  without 
leaving  C.  This  is  formalized  in  the  Lemma  below. 

Lemma  10  Let  C  he  a  weak- communicating  set  under  a  memoryless  strat¬ 
egy  pair  {x,y).  There  exist  memoryless  strategies  {x,y)  such  that 

1.  for  each  s  ^  C,  (xs,ys)  is  a  perturbation  of  {xs,ys); 

2.  C  is  closed  under  {x,y),  i.e.,  for  all  state  s  ^  C  we  have  5{C  \ 
s,xs,ys)  =  1; 

3.  for  all  s  ^  C,  s  is  reached  almost-surely  (with  probability  1)  in  finite 
time  from  every  state  s'  G  C,  under  {x,  y) . 

Proof.  For  every  edge  e  =  (s,  s')  in  Gq{x,  y)  there  is  a  perturbation  {xg,  y^) 
of  {xs,ys)  such  that  5{s'  \  s,  Xg,  yg)  >  0  and  5{C  \  s,  Xg,yg)  =  1.  For  any  state 
s  with  out-going  edges  ei ,  62 ,  •  •  • ,  ca,  ,  let  {xg^,ygfi),  {xg^  ,yg^),...,  {xg^ ,  ye  J , 
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be  the  corresponding  perturbation  with  the  property  described.  Define  the 
perturbation  (5?,  y)  as  follows: 


k 

Xg  =  +  a2Xe^  +  . . .  +  akXgf,  such  that  fVj  >  0,  fVj  =  1. 

i=l 

k 

Vs  =  +  P2ye2  +  •  •  •  +  PkVek  SUch  that  Pi>  0,'^Pi  =  1. 

i=l 

Since  {x,  ^  is  a  convex  combination  of  the  perturbations  it  follows  that 
5{C  I  s^XsiVs)  =  1,  for  all  s  ^  C.  Observe  that  C  is  a  closed  recurrent  class 
under  the  strategy  (5?,^.  Hence  the  desired  result  follows.  I 


Lemma  11  The  following  assertions  hold. 

1.  Let  s  ^  S,  and  a  memoryless  strategy  y  for  player  2  be  given.  There 
exists  a  G  ri(s)  such  that 

E['Ui(0i)  I  s,a,ys]  >  ■yi(s) 


2.  Let  CCS,  and  a  memoryless  strategy  y  for  player  2  be  given. 

Hi{y,C)  >  maxui(s) 
sec 


Proof. 

1.  It  follows  from  the  results  of  [7]  that  the  zero-sum  values  ui(-)  are 
characterized  by  fixed  points  of  a  matrix  game.  The  result  then  follows 
from  the  fact  that  in  any  matrix  game,  given  a  distribution  y  for 
player  2  there  is  an  optimal  move  a  that  maximizes  the  expected  one- 
step  payoff  for  player  1. 

2.  Follows  from  part  1  and  definition  of  Hi{y,  C).  I 

Definition  15  (Exit  distributions  [22,  23])  Given  C  C  C,  an  exit  dis¬ 
tribution  from  C  is  a  distribution  q  G  T>{S)  such  that  q{C)  <  1,  i.e., 
^sec  s)  <  1.  Let  {x,y)  be  a  pair  of  memoryless  strategies  and  C  C  C  be 
given.  We  define  unilateral  and  joint  exit  as  follows: 

1.  Player  1  unilateral  exit: 

Qi{x,y)  =  {  ^(-  I  s,a,ys)  where  s  eC,a  e  ri(s),^(C'  |  s,a,ys)  <  1  } 

i.e.,  player  1  force  the  play  out  of  C  with  positive  probability  playing 
move  a  against  the  memoryless  strategy  y. 
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2.  Player  2  unilateral  exit: 

Q2  (^5  y)  =  {  ^('  I  s,Xs,b)  where  s  ^  C,b  ^  ^2(5),  5(C  \  s,  Xg,  6)  <  1  } 

i.e.,  player  2  force  the  play  out  of  C  with  positive  probability  playing 
move  b  against  the  memoryless  strategy  x. 

3.  Joint  exit  of  the  players: 

{x,y)  =  {  ^(-  I  s,  a,  6)  where  s  G  C,  a  G  Fi  (s) ,  6  G  r2  (s) , 

5{C  I  s,  a,  ys)  =  5{C  \  s,Xs,b)  =  1,  and  |  s,  a,  6)  <  1  } 

i.e.,  playing  a  against  y,  and  b  against  x  keeps  the  play  in  C  with 
probability  1,  but  playing  a  and  b  jointly  the  players  can  ensure  the 
play  to  leave  C  with  positive  probability. 

Let  Q^' {x,y)  =  convex-hull{Qi  {x,y)UQ2  {x,y)UQ^' {x,y))  denote  the  convex 
combination  of  the  distributions  of  unilateral  and  joint  exit  distribution.  For 
all  distribution  Q  G  Q^'{x,y),  the  distribution  can  be  represented  as 

Q=  Y1  yisPis 

h(zLi  I2&L2  h&Lg 


where  Pi.  G  Q^'{x,y)  for  Ij  ^  Lj.  I 

For  a  distribution  Q  G  P{S)  and  a  payoff  7  for  all  states,  by  Eg  [7(01)] 
we  denote  Xlses  7(s)  •  Q{s). 

Definition  16  (Controllable  exit  distributions  and  controllable  sets  [22,  23]) 

A  distribution  Q  G  P{S)  is  a  controllable  exit  distribution  from  C  w.r.t.  to 
a  payoff  vector  7  =  (71,72),  if  for  every  e  >  0,  there  exists  strategy  pair 
(agjTTg)  and  two  bounded  stopping  times  ti  and  T2  such  that  for  all  s  ^  C 
the  following  conditions  hold: 

1.  Pr^-^-(e  Q  <  00)  =  1,  i.e.,  the  play  leaves  C  with  probability  1. 

2.  Prg®’^®(0eg  =  s')  =  Q{s'),  i.e.,  the  exit  distribution  from  C  is  the 
distribution  Q. 

3.  Prg®’^®(niin{  ti,T2  }  <  eg)  <  e,  i.e.,  the  stopping  times  are  smaller 
than  the  exit  time  with  small  probability  s. 
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4-  For  all  strategy  a, 

E'^’-^[7l(0eg)l(eg<.o]  +E"’"1«l(0rJl(eg>rO]  <  Eg [71  (0l)]  +  £, 
where  is  the  indicator  function  of  the  event  {  eg  <  ti  },  and 

l(eg>ri)  is  the  indicator  function  of  the  event  {  eg  >  ti  }.  Intuitively, 
if  play  leaves  C  within  time  ti  then  the  payoff  is  defined  by  the  exit 
distribution  and  payoff  and  if  the  game  stays  in  C  for  more 
than  Ti  steps,  then  the  payoff  is  defined  by  the  distribution  at  time  ti 
and  the  payoff  vi.  Then  the  expected  payoff  is  at  most  Eq[7i(0i)]  +  e. 
Similarly,  for  all  strategy  n, 

E'^-’^[72(0g.)l(g.<^^)]  +E'^-’^[u2(0Ti)l(eg>r2)]  <  Eg [72 (0i)]  +  £, 

where  l(g„<T-2)  indicator  function  of  the  event  {  eg  <  T2  },  and 

l(gg>^g  is  the  indicator  function  of  the  event  {  eg  >  T2  }. 

A  controlled  set  {C,  Q)  is  a  set  CCS,  and  Q  is  a  controllable  exit  distribu¬ 
tion  for  any  payoff  vector  7  >  u  =  (ui,  U2)-  ■ 

A  notion  that  complements  the  notion  of  controlled  set  is  a  blocking  pair. 
We  will  establish  the  relation  between  a  blocking  pair  and  a  controlled  set 
in  Lemma  13  and  Lemma  15. 

Definition  17  (Blocking  pairs)  Let  D  C  S,  andy  be  a  memoryless  strat¬ 
egy  for  player  2.  The  pair  {y,D)  is  a  blocking  pair  for  player  1  (i.e.,  player  2 
blocks)  if  for  all  s  ^  D,  and  for  all  a  G  ri(s)  we  have 

8{D  I  s,a,ys)  <  1  ^  E['Ui(0i)  |  s,a,ys\  <  max'Ui(s) 

.seD 

Informally,  by  playing  the  strategy  y  player  2  ensures  that  if  player  1  leaves 
the  set  D,  then  the  expected  payoff  for  player  1,  assuming  all  state  s  has 
reward  ui(s),  is  less  than  the  maximum  value  ui(-)  of  player  1  in  D.  Blocking 
pair  {x,D)  for  player  2  (i.e.,  player  1  blocks)  is  defined  by  exchanging  the 
roles  of  the  players.  I 

Reduced  game.  Let  C  be  any  controlled  set.  Then  the  game  Qq  is  ob¬ 
tained  from  Q  by  collapsing  the  set  C  to  a  single  dummy  state  {C},  and 
the  transition  function  at  {C}  defined  by  the  controllable  exit  distribution 
Q,  i.e.,  at  state  {C  }  players  have  only  a  single  move  *  and  the  transition 
function  at  state  {C}  given  the  moves  (*,  *)  is  given  by  the  distribution  Q. 
Hence  the  state  space  of  Qc  is  (S'  \  C)  U  {C}  and  we  denote  ^g  to  denote 
the  transition  function  of  Qq.  We  refer  to  this  process  as  collapsing  of  a 
controlled  set. 
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Definition  18  (Reduced  blocking  pair)  A  reduced  blocking  pair  is  a 
blocking  pair  in  a  reduced  game.  A  pair  (y,  D)  is  a  reduced  blocking  pair 
in  a  game  Qq  if  for  all  state  s  ^  D,  for  all  a  G  ri(s)  we  have 

5p{D  I  s,a,ys)  <  1  ^  [•ui(0i)  |  s,a,ys]  <  max'Ui(s) 

sG-D 

Note  that  ui(s)  is  the  value  of  the  original  game  and  not  the  reduced  game. 


The  following  proof  is  similar  to  Vieille’s  proof  [29]. 

Lemma  12  ([29])  Let  {x,D)  be  a  blocking  pair  for  player  2.  Then  there 
exists  D  C  D  such  that 

1.  (Cl)  V2{  )  is  constant  in  D,  i.e.,  for  all  s,s'  ^  D  we  have  V2(s)  = 
V2(s'). 

2.  (C2)  D  is  weak- communicating  w.r.t.  {x,n). 

3.  (C3)  {x,D)  is  a  blocking  pair  for  player  2. 

Proof.  Let  D  =  {  s  ^  D  \  V2(s)  =  }  be  the  set  of  states 

where  V2  is  maximum.  Since  (x,  D)  is  a  blocking  pair  for  player  2  it  follows 
immediately  that  {x,D)  is  also  blocking  pair  for  player  2.  Consider  the 
perturbed  graph  G^{x,n)  and  let  D  be  a  terminal  strongly  connected  end- 
component  in  the  graph.  There  is  no  edge  out  of  D,  D  is  closed  and  weak- 
communicating.  Since  D  C  D  it  follows  that  V2{-)  is  constant  in  D.  I 

The  following  Lemmas  will  be  the  basic  principle  of  a  reduction  mecha¬ 
nism  of  the  original  game. 

Lemma  13  ([29])  Let  {x,  D)  be  a  blocking  pair  for  player  2.  Then  there 
exists  D  C  D  such  that  one  of  the  following  two  conditions  hold: 

1.  D  is  a  controlled  set. 

2.  (a,  D)  and  (¥,  D)  are  blocking  pairs,  (a,  D)  and  {n,  D)  are  weak- 
communicating  and  vi  and  V2  are  constant  in  D. 

Proof.  Given  {x,D)  is  a  blocking  pair  consider  {x,D)  that  satisfies  condi¬ 
tion  (Cl  —  C3)  of  Lemma  12. 
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1.  If  I  s,  a,  ¥s)  <  1  and  E[ui(0i)  |  s,  a,  ¥«]  >  max^^-p  ui(s'),  for  some 
state  s  and  a  G  ri(s),  then  we  verify  that  D  is  a  controlled  set.  Chose 
s* ,  a*  such  that  it  maximizes  E[ui(0i)  |  s,  a,  tTs].  Then  by  construction 
we  have 

E[wi(0i)|5*,a*,7f,]>i?i(¥,:D). 

Eor  player  2  notice  that 

E[u2(0i)  I  s*,a*,7fs]  >  V2(s*)  =  m^  1)2(5)  (  since  V2  constant  in  D.) 

seD 

Hence  it  follows,  that  playing  x  with  perturbation  to  a*  is  a  control¬ 
lable  exit  distribution. 

2.  Otherwise,  note  that  {n,D)  is  blocking  pair  for  player  1.  Apply 
Lemma  12  to  (tt,  D)  with  roles  of  the  players  exchanged  and  let  D  C  D 
be  the  corresponding  subset.  Then  as  above, 

•  either  there  exists  exit  distribution  using  unilateral  exits  of 
player  2;  or 

•  (a,  D)  is  a  blocking  pair  for  player  2.  Moreover,  (ct,  D)  and  (tt,  D) 
satisfy  all  the  desired  conditions.  I 

We  present  a  sketch  of  the  following  Lemma  as  described  in  [22,  23].  Eor 
details  see  [22,  23]. 

Lemma  14  ([22,  23])  Let  C  C  C  and  let  (x,  C)  and  (y,  C)  be  blocking 
pairs  and  Q  G  convex-hull  (Q|i(a;,y))  be  an  exit  distribution.  Let  C  be  weak- 
communicating  under  {x,y).  Assume  that  the  following  conditions  hold: 

1.  Let  j  be  a  payoff-vector  such  that  ji(s)  >  Vi(s)  for  all  state  s  G  C,  for 
i  =  1,2. 

2.  For  all  s  G  C,  for  any  a  G  ri(s)  we  have  E[ui(0i)  |  s,a,ys]  < 

EQ[yi{ei)]. 

3.  For  all  s  G  C,  for  any  b  G  r2(s)  we  have  E[ui(0i)  |  s,Xs,b]  < 

[72(01  )]• 

Then  Q  is  a  controllable  exit  distribution. 

Proof. (Sketch).  Eix  /3  >  0  and  e  >  0  to  be  sufficiently  small.  By  definition 
of  weak- communication  we  have  that  for  all  s  G  C,  exists  (5?,  y)  such  that 

•  W,y)-  {x,y)\\  <  a. 
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•  if  both  players  play  {x,y)  then  the  game  leaves  C  with  probability  0, 
and  s  is  reached  with  probability  1  in  finite  time  for  any  state  s'  G  C. 

The  strategy  (cr,  tt)  is  defined  as  follows.  In  a  cyclic  manner  do  the  following 
for  exit  distributions  Pi^  for  I3  ^  L^. 

•  Step  1.  Denote  by  2;  the  state  where  the  joint  exit  distribution  occurs. 
Play  (rr,  y)  till  the  game  reaches  2. 

•  Step  2.  Let  a  =  (3  ■  rji^.  At  2  play  the  following  strategy 

((1  -  ^/a)x  +  ^/aa,  (1  -  y/a)y  +  ^/ab) 

•  Step  3. Continue  cyclically. 

Stopping  times  ti  and  T2- 

•  If  player  1  (resp.  player  2)  plays  an  move  that  is  not  compatible  with 
a  (resp.  tt)  then  ti  (resp.  T2)  is  stopped. 

•  For  every  I3  ^  consider  all  stages  that  the  play  has  been  in  Step  2 
and  check  if  the  opponent  has  perturbed  to  a  (or  to  b)  approximately 
in  the  specified  frequency,  i.e.,  the  ratio  ^/a  and  the  number  of  times 
the  move  played  by  player  1  (resp.  player  2)  was  a  (resp.  b)  is  in 
(1  —  e,  1  +  e),  for  small  e. 

The  statistical  test  is  done  only  if  the  number  of  rounds  the  players  is 
in  Step  2  is  sufficiently  large,  so  that  the  probability  of  false  detection  of 
deviation  is  small. 

If  a  and  e  are  sufficiently  small,  the  test  can  be  employed  effectively, 
since  exiting  C  occurs  after  O(^)  stages  whereas  the  players  should  per¬ 
turb  with  probability  ^/a.  Hence  until  the  exit  occurs,  each  player  should 
perturb  0{-^)  times,  which  is  enough  for  the  statistical  test.  Once  the 
statistical  test  fails,  or  the  stopping  time  is  reached  the  players  play  the 
spoiling  strategies  of  the  zero-sum  games,  ensuring  that  the  other  player’s 
payoff  is  no  more  than  her  value  in  the  zero-sum  game. 

Since  the  players  switches  to  their  spoiling  strategies  ultimately  no  player 
has  an  unilateral  incentive  to  stay  in  C.  Thus  there  is  no  profitable  deviation 
for  the  players  and  hence  Q  is  a  controllable  exit  distribution.  I 

The  following  Lemma  is  an  proof  of  Vieille  using  Solan’s  result. 


33 


Lemma  15  ([29])  Let  (a,  D)  and  (tt,  D)  be  blocking  pairs  such  that  D  is 
weak- communicating  under  {a,n).  Then  there  is  a  controllable  exit  distri¬ 
bution  Q  such  that  Q  G  convex-hull{Q^~{a,n))  and  {D,Q)  is  a  controlled 
set. 

Proof.  Consider  the  perennial  e-optimal  strategies  and  such  that 
(Reach(17))  =  1  (recall  Lemma  8).  Hence  we  have  Prg®’^®(ej^  < 
oo)  =  1.  Let  us  denote  by 

Q,  =  Pr^-^-(0e.  =  •) 

the  law  of  exit  distribution  from  D  under  strategy  and  Since  e^  is 
finite  almost-surely  (with  probability  1)  and  the  strategies  and  are 
perennial  e-optimal  strategies  we  have 

Eq,  [ui  (065  )]  >  (s)  -  e;  Eq^  [v2  i&eg  )]  >  W2  (s)  -  £• 

Since  (a,  D)  and  (¥,  D)  are  blocking  pairs  it  follows  that  for  any  his¬ 
tory  LVn  =  {so,5i,  •  •  •  ,5n),  we  have  5{D  \  Sn,  cr^(cUn),Ts„)  =  1  and  h(D  | 
Snj'^sni  =  1-  Hence  G  convex-hull  (Q|j)-  It  follows  from  Solan’s 

result  (Lemma  14)  that  there  is  a  exit  distribution  Q  G  convex-hull  (Q~) 
such  that 

Eqivii&eg)]  >  Wl(s);  EQ[v2{&eg)]  >  V2{s). 

Since,  Q  involves  no  unilateral  exit  it  follows  from  Solan’s  result  (Lemma  14) 
that  {D,  Q)  is  controllable  for  all  7  >  u  =  (ui,U2)-  ■ 

Reduction  sequence.  It  follows  from  Lemma  13  and  Lemma  15  that  if 
there  is  a  blocking  pair  in  a  game  there  is  a  controlled  set.  The  analysis  of 
Vieille  (Lemma  41-45  of  [29])  presents  an  mechanism  to  collapse  the  con¬ 
trolled  sets  of  the  game  to  get  a  game  Q*  such  that  there  is  no  reduced 
blocking  pair  in  the  game  Q* .  The  key  idea  is  as  follows:  (a)  let  the  original 
game  be  Qq  =  Qr;  if  Qq  has  no  blocking  pair  then  Q*  =  Qo,  (b)  else  there  is 
a  sequence  of  controlled  sets  Co,Ci, . . .  such  that  each  Cj  is  a  maximal 
controlled  set,  and  the  game  can  be  reduced  by  collapsing  the  controlled  sets 
in  sequence,  i.e.,  is  obtained  from  Qi  by  collapsing  the  controlled  set  Cj. 
Since  has  fewer  moves  than  Qi  and  the  move  set  is  finite  the  process 
stop  after  finite  number  of  steps  (say  k  steps).  The  analysis  of  Vieille  shows 
that  the  game  Q*  =  Qk+i  has  no  reduced  blocking  pair. 

Lemma  16  Let  Q*  be  the  game  with  no  reduced  blocking  pair  and  the  tran¬ 
sition  function  of  the  game  be  5* .  The  following  assertions  hold  in  the  game 
Q*: 
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1.  For  every  memoryless  strategy  x  for  player  1  there  is  a  memory¬ 
less  strategy  y  for  player  2  such  that  Pi^’^ {Reach{U))  =  1,  and 
Pvfy{Reach{W2))  >  V2{s). 

2.  For  every  memoryless  strategy  y  for  player  2  there  is  a  memory¬ 
less  strategy  x  for  player  1  such  that  Pi^’^ (Reach(U))  =  1,  and 
Pify (Reach(Wi))  >  vi(s). 

Proof.  We  prove  the  result  for  case  1,  and  argument  for  case  2  is  symmetric. 
Let  a  memoryless  strategy  x  for  player  1  be  given.  For  s  G  S'  define 

B{s)  =  {be  r2(s)  I  E^.  [•U2(©l)  I  s,Xs,b]  >  V2{s)  }. 

Note  that  for  any  controlled  set  C  that  is  reduced  we  have  E^.  [u2(0i)  | 
s,*,*]  =  EQ^fv2{@eg)]  >  V2{s),  where  Qq  is  the  controllable  exit  distri¬ 
bution  from  C.  Hence  it  follows  that  B{s)  /  0  for  all  states  s.  Choose 
a  y  such  that  Supp(ys)  =  B{s).  Consider  the  Markov  chain  under  the 
memoryless  strategy  pair  {x,y).  Consider  any  arbitrary  F  C  C.  Let 
F  =  {  s  e  F  \  V2(s)  =  maxg/Qf'  V2(s')  }.  Since  there  is  no  reduced 
blocking  pair  in  the  game  0*,  we  must  have  for  some  state  s  e  F  that 
5*{F  I  s,Xs,b)  <  1  and  E^. [u2(0i)  |  s,Xs,b]  >  max^^-pV2{s).  Since  F  con¬ 
sists  of  the  set  of  states  of  F  with  maximum  value  for  player  2,  we  have 
that  if  5*{F  \  s,Xs,b)  <  1  and  E^.  [u2(0i)  |  s,Xs,b]  >  max^^-pV2{s),  then 
5*{F  I  s,Xs,b)  <  1.  Hence  no  subset  F  C  C  is  closed.  Since  under  {x,y)  we 
have  a  Markov  chain  it  follows  that  PrJ’^  (Reach (17))  =  1. 

Einally  observe  that  PrJ’^(Reach(kF2))  >  V2(s),  since  (u2(0n))n)  is  a 
sub-martingale  under  {x,y).  I 

Recall  that  Wi  =  { toO)  toi  }  and  W2  =  { toO)  tio  }  and  the  game  Qr  is  the 
nonzero-sum  reachability  game  with  objective  Reach(kFi)  for  player  1  and 
Reach  (VF2)  for  player  2. 

Lemma  17  The  following  assertions  hold: 

1.  For  every  e  >  0,  there  is  an  e-Nash  equilibrium  (cr^,  nf.)  in  the  nonzero- 
sum  reachability  game  Q*  such  that  Pr^*’^*  (Reac/j(17))  =  1;  and 
Pr^*’^* (Reac/j(kFi))  >  ui(s)  —  e  and  Pr^*’^* (Reac/j(kF2))  >  V2{s)  —  e. 

2.  For  every  e  >  0,  there  is  an  e-Nash  equilibrium  {a*,  n*)  in  the  nonzero- 
sum  reachability  game  Qr  such  that  Pr^  {Reach{U))  =  1;  and 
Pr^  {Reach{Wi))  >  ui(s)  —  e  and  Pr^  {Reach{W2))  >  V2{s)  —  e. 

Proof. 
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1 .  For  e  >0,  let  (cr^,  TT^)  be  a  memoryless  e-Nash  equilibrium  in  the  game 
Q* .  The  existence  of  memoryless  e-Nash  equilibrium  in  two-player 
nonzero-sum  games  with  reachability  objectives  follows  from  [5].  The 
result  then  follows  from  Lemma  16. 

2.  It  follows  from  the  dehnition  of  controlled  sets  that  if  «+i,7r*+i)  is 

an  e-Nash  equilibrium,  with  e  — )■  0,  in  the  game  satisfying  the 

assumptions  of  part  1,  then  the  following  strategy  prohle  (a*,  n*)  is  an 
e-Nash  equilibrium  in  Qi 

(a)  Let  Ci  be  the  controlled  set  collapsed  in  game  Qi  to  obtain  game 
Then  the  strategy  prohle  are  as  follows:  play  ( cr 7r*_|_^) 
for  all  states  in  Si  \  Ci  and  play  the  controllable  exit  distribution 
(cr,  tt)  at  every  state  in  Cj.  Then  the  strategy  (cr*,  tt*  )  is  an  e-Nash 
equilibrium  satisfying  the  required  assumptions.  By  induction 
the  result  follows  for  Q  =  Qq. 

The  desired  result  follows.  I 


Lemma  18  For  every  e  >  0,  there  is  an  e-Nash  equilibrium  (cr*,7r*)  in  the 
nonzero-sum  reachability  game  Qr,  and  there  exists  A:  G  N  such  that 

1.  FCj^^\Reach!^{U))  >  1  -  e; 

2.  for  every  history  u)  G  Outcome(s,  cr*,  tt*),  if  ujk  =  Sk,  then 

(a)  Pr^*’^*(i?eac/j(VFi))  >  vi{sk)  -  e; 

(b)  Pr^*’^*(i?eac/j(VF2))  >  W2(sa:)  -£• 


Proof.  Let  us  denote  the  e-Nash  equilibrium  prohle  satisfying  the  conditions 
of  Lemma  17  as  (ct,  ^),  i.e.,  Pr^’^ (Reach (17))  =  1.  Hence  for  e  >  0,  there 
exists  k  such  that  Prg’^(Reach^(17))  =  1  — e.  The  strategy  (cr*,  tt*)  is  dehned 
as  follows: 


•  (cr*,  TT*)  =  (a^  -\-a,Tr^  -\-n),  i.e.,  the  players  play  (a,  tt)  for  k  steps  and 
then  again  switches  to  {a,n).  Formally,  for  a  history  u)  =  {soiSi, . . .) 
we  have 


cr*(soSi  ■  ■  ■  Sn) 

7r*(so5l  ■  ■  ■  Sn) 


a(soSi . . . Sn) 

ii  n  <  k 

<j{Sk  ■  ■  ■  Sn) 

if  n>  k 

7r(so5l  .  .  .  Sn) 

if  n  <  k 

Tr{Sk  ■  ■  ■  Sn) 

if  n  >  k 
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Since  for  all  state  s  the  following  conditions  hold: 

1  Pr^>^(Reach(17))  =  1; 

2.  Prg’^(Reach(VFi))  >  ■ui(s)  —  e;  and 

3.  Prg’^(Reach(VP2))  >  '^2(5)  —  e; 

it  follows  that  (cr*,  tt*)  is  an  e-Nash  equilibrium  with  the  desired  property.  I 

Theorem  1  (e-Nash  equilibrium  in  Sscc  game)  Let  Q  be  a  Sscc 
game  with  parity  objective  for  player  1  and  ^2  for  player  2.  For  ev¬ 
ery  e  >  0,  there  is  an  e-Nash  equilibrium  for  every  state  s  G  C. 


Proof.  The  case  when  properties  P1-P4  of  Lemma  1  hold  the  result  follows 
from  Lemma  1.  The  case  when  properties  P1-P4  are  not  satisfied  we  consider 
the  nonzero-sum  reachability  game  Qr.  We  obtain  a  e-Nash  equilibrium  in 
the  original  game  considering  the  e-Nash  equilibrium  of  the  reachability 
game  Qr  and  then  using  spoiling  strategies. 

Fix  arbitrary  e  >  0,  and  we  show  that  there  is  an  3e-Nash  equilibrium  for 
every  state  s  G  C.  Since  e  is  arbitrary  the  result  follows.  Let  (cr*,  tt*)  be  an 
e-Nash  equilibrium  of  the  reachability  game  Qr  as  constructed  in  Lemma  18. 
Consider  the  strategy  a*  for  player  1  defined  as  follows: 


W  ('®0  5  ■Sl  •  •  •  5  ■§;) 


cr*(so,  Si, . . . ,  ii  I  <  k  (A:  of  Lemma  18) 

^^(so,  si, . . . ,  s;)  ii  I  >  k  {k  of  Lemma  18  and  G  S^) 


i.e.,  player  1  plays  a*  for  k  steps  and  then  switches  to  an  e-spoiling  strategy 
CTg.  Similarly,  the  strategy  for  player  2  is  defined  as  follows: 


'^e  ('^Oj  ■Si  ...  ,  sf) 


7r*(so,si,  ...,si) 
(sQj  Si  ,  .  .  .  ,  S;  ) 


iil  <  k  (A:  of  Lemma  18) 

\il  >  k  {k  oi  Lemma  18  and  Ifg  G  Llg) 


Since  Pr^*’^*  (Reach^(17))  >  1  —  e,  we  have  that 


Pr^L<(^iJ  >  Prf (Reach (Wi))  -  e 


and 

>  Prf ’^*(Reach(W2))  -  e. 

Recall  that  (cr*,7r*)  is  an  e-Nash  equilibrium  of  the  reachability  game 
such  that  for  every  history  uj  G  Outcome (s,  cr*,  tt*)  we  have  if  =  s^  then 
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Pr^*’^*  (Reach (VFi))  >  and  Prg*’^*(Reach(VP2))  >  '^2(5^)  — £•  Since 

the  players  play  an  e-spoiling  strategy  after  A:-steps  it  follows  that 

Vct  G  S.  Pr^’^-(^i,)  <  Prf ’^*(Reach(VPi))  +  2e  <  Pr^-^- (^1,)  +  3e 
and 

Vtt  G  n.  Pr^-’^(^i,)  <  Prf ’^*(Reach(VP2))  +  2e  <  Pr^-’^-(^2s)  +  3e. 

Hence  it  follows  that  (a*,7r*)  is  an  3e-Nash  equilibrium.  I 

4  Existence  of  e-Nash  equilibrium 

In  this  section  we  show  that  for  all  nonzero-sum  concurrent  game  0,  with 
w-regular  objectives  specified  as  parity  objectives  and  ^2  for  player  1  and 
player  2,  respectively,  for  every  e  >  0,  there  exists  an  e-Nash  equilibrium 
for  every  state  s  of  game  0-  The  proof  follows  from  an  inductive  argument: 
by  induction  on  the  size  of  the  state  space  of  the  0  and  by  application  of 
Theorem  1.  We  assume  without  loss  of  generality  that  there  are  four  special 
states  {  too,toi,tio,tii  }  in  0,  as  defined  in  Definition  10. 

Lemma  19  Let  Q  he  a  concurrent  game  with  parity  objectives  and  ^2 
for  player  1  and  player  2,  respectively.  Let  Gg  be  the  graph  of  Q  and  TC 
be  a  terminal  strongly  connected  component  in  Gg .  Then  for  every  e  >  0, 
there  is  an  e-Nash  equilibrium  for  every  state  s  G  TC. 

Proof.  The  proof  is  by  induction  on  the  size  of  TC.  It  is  easy  to  argue  when 
|TC|  =  1,  i.e.,  TC  consists  of  an  absorbing  state.  Consider  the  sub-game 
induced  by  the  set  of  states  TC  and  call  the  sub-game  Qtc- 

•  Suppose  there  is  a  state  s  G  TC  such  that  {{l))„a;(^i)(s)  =  1.  Then  fix 
an  e-optimal  strategy  a  for  player  1  and  let  tt  be  an  e-optimal  strategy 
for  player  2  against  a.  Then  (cr,  tt)  is  an  e-Nash  equilibrium.  We  can 
replace  s  by  the  gadget  described  in  Proposition  2.  This  will  break 
TC  into  (possibly  many)  smaller  strongly  connected  components.  By 
induction  hypothesis,  Theorem  1  and  the  bottom-up  evaluation  pro¬ 
cedure  described  in  Lemma  1  it  follows  that  e-Nash  equilibrium  exists 
at  every  state  in  TC.  Similar  arguments  hold  if  there  is  a  state  s  G  TC 
such  that  {{2))yal{^2){s)  =  1. 
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•  Suppose  for  every  state  s  G  TC  we  have  {{l))„a;(^i)(s)  <  1  and 
{{2))uai(^2)(5)  <  1.  It  follows  from  Corollary  1  of  [6]  that  in  a  zero- 
sum  concurrent  game  with  w-regular  objectives  if  for  every  state  s  we 
have  {{l))uai(^i)(s)  <  1,  then  for  every  state  s  in  the  game  we  have 
{{l))ua;(^i)('S)  =  Oi  i-6-)  if  the  zero-sum  value  is  positive  for  player  1  at 
some  state,  then  there  exists  a  state  s  where  the  zero-sum  value  is  1. 
Hence  it  follows  from  the  above  condition  that  for  all  state  s  G  TC 
we  have  {{l))„ai(^i)(s)  =  0  and  {{2))^al{^2){s)  =  0.  Let  be  an 
e-spoiling  strategy  for  player  2  and  be  an  e-spoiling  strategy  for 
player  1.  Hence  we  have  the  following  inequalities: 

Vct  G  S.  Pr^’^"(^i,)  <  e  and  Vtt  G  H.  Prf"’^(^2s)  <  £• 

Hence  we  have  {ae,ne)  is  an  e-Nash  equilibrium  for  all  state  s  G  TC. 


Theorem  2  (e-Nash  equilibrium)  Let  Q  he  a  concurrent  game  with  par¬ 
ity  objectives  and  ^2  for  player  1  and  player  2,  respectively.  For  every 
e  >  0,  there  is  an  e-Nash  equilibrium  for  every  state  s  ^  S. 

Proof.  Let  Gg  be  the  graph  of  Q.  It  follows  from  Lemma  19  that  for  any 
state  s  in  a  terminal  strongly  connected  component  of  Gg  there  is  an  e-Nash 
equilibrium.  By  Proposition  2  we  can  replace  every  state  s  of  a  terminal 
strongly  connected  component  by  the  gadget  described  in  Proposition  2. 
For  the  rest  of  the  strongly  connected  components  we  proceed  in  a  bottom- 
up  order  as  follows:  consider  a  strongly  connected  component  G  when  all 
the  strongly  connected  component  below  it  are  replaced  by  the  gadgets  of 
Proposition  2.  The  sub-game  induced  by  G  and  the  gadgets  of  the  strongly 
connected  components  below  G  form  a  Sscc  game.  By  Theorem  1  we  have 
there  is  an  e-Nash  equilibrium  for  every  state  s  G  C.  ■ 

5  Computational  Complexity 

In  this  section  we  show  how  to  compute  the  values  of  some  e-Nash  equi¬ 
librium  of  Sscc  games  within  e-precision.  We  prove  that  every  case  of  the 
existence  proof  of  e-Nash  equilibrium  is  constructive  and  computable.  It 
may  be  noted  that  even  in  the  case  of  zero-sum  concurrent  games  with  par¬ 
ity  objectives  the  values  can  be  irrational  (for  an  example  see  [7]).  Hence, 
one  can  only  achieve  e-approximation  of  the  values  in  the  general  case  of 
nonzero-sum  concurrent  parity  games.  It  follows  from  the  inductive  argu¬ 
ment  of  Theorem  2  that  the  values  of  e-Nash  equilibrium  for  concurrent 
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games  can  be  computed  by  ra-iterations  of  a  procedure  to  compute  e-Nash 
equilibrium  values  for  Sscc  games. 

Complexity  of  e-Nash  equilibrium  in  Sscc  games.  To  analyze  the 
complexity  of  computing  values  of  some  e-Nash  equilibrium  in  Sscc  games 
we  consider  the  following  cases: 

1.  Case  1.  Compute  the  values  of  e-Nash  equilibrium  when  the  property 
PI  of  Lemma  1  is  satisfied. 

2.  Case  2.  Compute  the  values  of  e-Nash  equilibrium  when  the  property 
P4  of  Lemma  1  is  satisfied. 

3.  Case  3.  Compute  the  values  of  e-Nash  equilibrium  when  the  property 
P2  or  P3  of  Lemma  1  is  satisfied. 

4.  Case  4.  Compute  the  values  of  some  special  e-Nash  equilibrium  of 
Sscc  games  with  reachability  objectives. 

We  analyze  the  above  cases  below. 

1.  Case  1.  Given  and  ^2  are  parity  objectives,  the  objective 

n  4^2  is  a  Streett  objective  [25].  To  analyze  the  computation  of 
sup(£r,7r)esxn  C  ^'2),  observe  that  this  is  equivalent  to  the 

computation  of  values  of  one-player  games  (MDPs)  where  player  1 
and  player  2  cooperates  to  achieve  the  objective  n  4^2.  Hence  the 
computation  reduces  to  computing  values  in  a  MDP  with  Streett  ob¬ 
jective.  This  can  be  achieved  in  polynomial  time  [3]. 

2.  Case  2.  After  the  computation  of  the  zero-sum  values  ui(-)  and  V2(-), 
it  is  easy  to  determine  if  there  is  a  state  s  such  that  ui(s)  =  0  and 
V2(s)  =  0.  Hence  Case  2  can  be  solved  by  computing  the  zero-sum 
values  for  player  1  and  player  2. 

3.  Case  3.  Given  the  zero-sum  values  for  player  1  and  player  2  are  com¬ 
puted,  we  describe  a  polynomial  time  procedure  to  determine  the  val¬ 
ues  of  some  e-Nash  equilibrium  when  property  P2  or  P3  of  Lemma  1 
is  satished.  We  prove  the  result  for  the  case  when  property  P2  is 
satished  and  the  result  for  the  case  when  property  P3  is  satished 
is  symmetric.  Consider  the  set  W  =  {  s  |  ui(s)  =  1  }  of  states 
that  have  zero-sum  value  1  for  player  1.  Since  property  P2  is  sat¬ 
ished,  we  have  W  fl  C  /  0.  Given  a  state  s  G  W,  consider  the  set 
SafeAct(s)  =  {  a  G  ri('S)  |  V  6  G  r2('S).  Dest(s,a,6)  C  W  }  of  moves 
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for  player  1  that  ensures  that  the  set  W  is  never  left.  Consider  a 
reduced  sub-game  0'  induced  by  W  such  that  at  every  state  s  ^  W 
the  available  moves  for  player  1  is  SafeAct(s).  Let  S'  be  the  set  of 
strategies  such  that  player  1  plays  only  moves  in  SafeAct(s)  for  every 
state  s  G  VL,  i.e.,  the  set  of  strategies  in  Q' .  We  compute  the  values 
val{s)  =  sup(^^^)es,xnPrs’''(^i  C  ^2)- 

Observe  that  there  exists  e-optimal  strategy  erg  of  the  original  game 
such  that  for  every  strategy  tt  G  If  we  have  fl  Safe(W))  > 

1  —  e,  for  all  state  s  G  W .  Hence  it  follows  that 

Pr^"’^(^2)  <  Pr^"’^((^i  nSafe(W))  0^2) +  e 

<  sup  Prg’''(^i  n  ^2)  +  £•  (1) 

((J,7r)GS^  xn 

•  If  for  some  state  s  G  VP  fl  C  we  have  val{s)  =  1,  then  property 
PI  of  Lemma  1  is  satisfied  and  then  Case  1  is  followed. 

•  Else  for  every  state  s  G  VP  fl  C  we  have  val{s)  <  1.  It  follows 
from  property  of  MDPs  that  for  any  w-regular  objective  the 
maximum  probability  to  satisfy  ^  is  equal  to  the  maximum  prob¬ 
ability  of  reaching  the  set  of  states  where  the  value  is  1.  Hence 
we  have 


sup  Pr^’^(^i  n^2)  =  sup  Pr^’^(Reach(too))  (2) 

((J,7r)GS^xn  ((j,7r)GS^xn 

We  show  that  for  every  state  s  G  VP  fl  C,  the  profile  (1,  val{s)) 
is  the  value  of  some  e-Nash  equilibrium  profile.  Let  (ct,  tt)  be  a 
memoryless  strategy  such  that  Pr^ ’''(Reach(too))  =  val{s),  for 
all  s  G  VP  n  C.  The  existence  of  such  a  memoryless  strat¬ 
egy  follows  from  [4].  For  any  e  >  0,  let  A:  G  N  be  such  that 
Prg’''(Reach^(too))  >  val{s)  —  s.  The  strategy  profile  (cr*,7r*)  is 
described  as  follows: 


cr*(so,  5l,  •  •  •  ,  Si) 


■SI5  •  •  •  5  ■s/c)  I  ^  k 

^1:  ■  ■  ■  1  ^k)  I  ^  k,  (Tg  G  Sg 


and  TT*  =  TT.  Given  strategy  a* ,  for  any  strategy  tt  the  play  never 
leaves  VP  within  k  steps,  since  a  G  S'.  Since  erg  G  Sg  and  for 
every  state  s  G  VP  we  have  {{l))„a; (4'i)(s)  =  1  it  follows  that 
Pr^pTf*  >  1  —  e.  Since  a*  follows  a  for  k  steps,  it  follows  that 
Pr^pTf*  >  Pr^*’"*  (Reach(too))  —  £•  It  follows  from  equation  1 
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and  2  that  sup^^n  Pi’s  ’^(^2)  <  Pi’s  (Peach(too))  +£■  Hence 
it  follows  that  (l,?;a/(s))  is  an  e-Nash  equilibrium  value  profile 
for  all  state  s  G  VH  fl  C,  for  all  e  >  0. 

It  follows  from  above  that  the  values  of  some  e-Nash  equilibrium  of 
states  s  G  C  can  be  computed  by  a  polynomial  procedure  and  solving 
the  zero-sum  values  for  player  1  and  player  2  when  Case  1,  Case  2  or 
Case  3  is  satisfied.  The  analysis  of  Case  4  involves  solving  some  special 
e-Nash  equilibrium  values  of  a  game  with  reachability  objectives.  We 
argue  below  the  existence  of  polynomial  witness  and  polynomial  time 
verification  procedure  for  Case  4. 

4.  Case  4.  The  polynomial  witness  and  the  polynomial  time  verification 
procedure  for  the  witness  consists  of  the  analysis  of  the  following  two 
cases: 

(a)  Witness  and  verification  procedure  for  the  reduction  sequence 
that  is  described  after  Lemma  15. 

(b)  Witness  and  verification  procedure  for  e-Nash  equilibrium  in  the 
reachability  game  Q*,  when  the  reduction  sequence  terminates. 

Observe  that  in  the  reduction  sequence  defined  after  Lemma  15,  the 
length  of  the  of  the  sequence  is  linear  in  the  size  of  the  game-graph. 
The  fact  follows  since  every  reduction  step  decreases  the  number  of 
moves  by  at  least  1.  We  show  that  there  are  polynomial  witness  for 
every  reduction  step  and  thereby  establish  existence  of  polynomial  wit¬ 
ness  for  the  entire  reduction  sequence.  The  polynomial  witness  for  a 
reduction  step  consists  of  a  controlled  set.  It  follows  from  the  results 
of  [5,  12]  that  any  memoryless  strategy  (or  a  memoryless  distribution) 
can  be  suitably  e-approximated  by  A:-uniform  memoryless  strategies, 
where  a  A:-uniform  memoryless  strategy  is  a  strategy  that  assigns  prob¬ 
abilities  to  every  move  as  multiples  of  j,  where  £  <  k.  Moreover,  k  is 
polynomial  in  |^|  and  We  now  analyze  the  following  cases  to  pro¬ 
vide  the  polynomial  witness  and  verification  procedure  for  controlled 
set. 

(a)  If  Case  1  of  Lemma  13  holds  then  the  witness  of  a  controlled  set 
consists  of  a  set  D,  memoryless  strategy  pair  x  and  n  such  that 

i.  {x,D)  is  a  blocking  pair  for  player  2; 

ii.  V2{-)  is  constant  in  D; 

iii.  D  is  weak-communicating  under  {x,n). 
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Since  x  and  ¥  are  memoryless  strategies  the  witnesses  can  be 
e- approximated  by  A:-uniform  memoryless  strategies  and  the  wit¬ 
nesses  are  polynomial.  It  is  easy  to  verify  that  {x,  D)  is  a  block¬ 
ing  pair  by  verifying  that  for  every  state  s  G  Z?  we  have  if 
5{D  I  s,Xs,b)  <  1  then  E[u2(0i)  |  s,Xs,b]  <  max^^^P  U2(')-  Since 
the  zero-sum  values  V2{-)  is  computed  the  verification  procedure 
is  achieved  in  polynomial  time.  Again  since  the  zero-sum  values 
V2{-)  is  computed  it  is  easy  to  verify  that  V2{-)  is  constant  over  D. 
To  conclude  D  is  weak- communicating  under  {x,n)  it  is  sufficient 
to  construct  the  perturbed  graph  Gjy{x,n)  and  verify  that  D  is 
strongly  connected.  The  last  step  of  the  verification  procedure 
checks  that  for  some  state  s  ^  D  and  a  G  ri(s)  we  have  5{D  \ 
s,a,ns)  <  1  and  E[ui(0i)  |  >  max^,g-p  ui(s').  The  pro¬ 

cedure  then  chooses  (s*,a*)  that  maximizes  E[ui(0i)  |  s,a,¥s]. 
The  controlled  exit  distribution,  as  described  in  Lemma  13,  then 
consists  of  playing  the  distribution  x  with  perturbation  to  a*. 
This  establishes  the  existence  of  polynomial  witness  and  poly¬ 
nomial  time  verification  procedure  for  the  case  when  part  1  of 
Lemma  13  holds.  Similar  arguments  hold  for  the  symmetric 
case  when  the  condition  holds  for  player  2.  Otherwise,  part  2 
of  Lemma  13  holds.  We  analyze  the  case  below. 

(b)  If  part  2  of  Lemma  13  holds  then  there  are  blocking  pairs  (a,  D) 
and  (tt,  Z?)  such  that  D  is  weak- communicating  under  (ct,  ¥). 
Since  a  and  n  are  memoryless  strategies  arguments  analogous  to 
the  previous  case  proves  the  existence  of  polynomial  witness  and 
polynomial  time  verification  procedure  for  the  above  condition. 
It  follows  from  Lemma  15  that  if  the  above  condition  holds  then 
there  is  a  controlled  exit  distribution  Q  G  convex- hull  (Q^(a,7f)). 
Since  Q  is  memoryless  distribution  it  can  be  e- approximated  by 
a  A:-uniform  memoryless  strategy  such  that  k  is  polynomial  in  |^| 
and  K  To  verify  that  Q  is  a  controlled  exit  distribution  the  condi¬ 
tions  of  Lemma  14  needs  to  be  verified.  The  verification  resembles 
the  analysis  of  a  Markov  chain  under  the  distribution  Q.  Since 
Q  can  be  approximated  by  a  polynomial  A:-uniform  memoryless 
strategy  the  verification  is  achieved  in  polynomial  time. 

It  follows  from  above  that  at  every  reduction  step  the  witness  of  a 

controlled  set  is  polynomial  and  can  be  verified  in  polynomial  time. 

We  now  consider  the  case  when  the  reduction  sequence  terminates  and 

e-Nash  equilibrium  of  the  reachability  game  Q*  needs  to  be  computed 
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(recall  Lemma  16).  The  existence  of  polynomial  witness  and  poly¬ 
nomial  time  verification  procedure  to  compute  values  of  such  e-Nash 
equilibrium  follows  from  [5]. 

Let  ZeroSum(^,  e)  denote  the  time  complexity  of  an  algorithm  to 
compute  the  zero-sum  values  of  a  concurrent  parity  game  within  e- 
precision.  Let  NonzeroSumReachability  (^,  e)  denote  the  complexity  of 
an  algorithm  to  compute  the  values  of  some  e-Nash  equilibrium,  greater 
than  some  specified  value,  of  a  concurrent  game  with  reachability  objec¬ 
tives.  It  follows  from  [2]  and  [5]  that  there  exists  ZeroSum(^,e)  and 
NonzeroSumReachability  (^,  e)  that  are  in  the  complexity  class  FNP,  for 
constant  e  >  0.  The  above  analysis  gives  us  the  following  Theorem  on  com¬ 
plexity  of  computing  the  values  of  some  e-Nash  equilibrium  in  concurrent 
games  with  parity  objectives. 

Theorem  3  (Complexity  of  e-Nash  equilibrium)  Let  Q  be  a  two- 
player  concurrent  game  structure  with  n  states.  Then  the  following  asser¬ 
tions  hold: 

1.  The  value  of  some  e-Nash  equilibrium  of  a  nonzero-sum  concurrent 
game  with  parity  objectives  can  be  computed  in  time 

0(ra(ZeroSum(^,  e)-|-NonzeroSumReachability  (^,  e))  -|-0(p(|^|)) 

where  p  is  a  polynomial  function. 

2.  For  every  constant  e  >  0,  the  values  of  some  e-Nash  equilibrium 
of  a  nonzero-sum  concurrent  game  with  parity  objectives  can  be  e- 
approximated  in  FNP;  and  hence  in  EXPTIME. 

6  Conclusion 

In  case  of  two-player  concurrent  games  we  extend  the  existence  of  e-Nash 
equilibrium,  for  every  e  >  0,  from  safety  and  reachability  objectives  to  the 
class  of  w-regular  objectives.  Our  analysis  also  shows  that  computation  of 
values  of  some  e-Nash  equilibrium  can  be  reduced  to  two  simpler  problems: 
(a)  computing  values  of  zero-sum  games;  and  (b)  computing  values  of  e-Nash 
equilibrium  of  nonzero-sum  reachability  games.  The  possible  extension  of 
the  result  can  be  made  in  two  directions: 

I.  More  players.  The  existence  of  e-Nash  equilibrium,  for  all  e  >  0, 
in  ra-player  games  with  w-regular  objectives  remains  an  open  problem. 
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The  problem  is  likely  to  require  more  involved  analysis.  In  case  of 
ra-player  games  with  safety  objectives  the  existence  of  Nash  equilib¬ 
rium  proof  critically  relies  on  the  existence  of  finite  counterexamples 
for  safety  objectives.  In  case  of  ra-player  games  with  reachability  ob¬ 
jectives  the  existence  of  e-Nash  equilibrium,  for  all  e  >  0,  is  achieved 
by  analyzing  discounted  version  of  the  original  game.  Unfortunately, 
both  the  above  ideas  fails  for  infinitary  objectives  like  w-regular  ob¬ 
jectives.  In  case  of  ra-player  games  application  of  punishing  strategies 
is  complicated  and  to  the  best  of  the  authors’  knowledge,  no  general 
result  is  known  for  existence  of  e-Nash  equilibrium  in  case  of  ra-player 
games  that  is  achieved  by  applying  punishing  strategies. 

2.  More  objectives.  The  existence  of  e-Nash  equilibrium,  for  all  e  >  0, 
in  case  of  two-player  games  with  objectives  in  the  higher  levels  of  Borel 
hierarchy  than  w-regular  objectives  remains  another  open  problem. 
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